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Introduction and notation 



This course is an introduction to some of the most ingenious ideas in 18th cen- 
tury mathematics. Some of these ideas are so important that their discoverers' 
names appear on the side of the Eiffel Tower. 




If you perform sufficiently well on the final exam, who knows where your name 
will end up? 



Structure 

The course is divided into two parts. 

In the first part we will see how to turn some natural physical and geometric 
problems into differential equations. Some of these will be ordinary differential 
equations (for functions of a single variable) others will be partial differential 
equations (for functions of several variables). How do these equations arise? 
In ordinary calculus of a single variable, maxima and minima of a function 
0: R ^ R at G R are found by differentiating and finding the zeros of the 
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resulting function. The result is an equation for xq'- 

Now suppose that you're interested in minimising or maximising something over 
a space of all functions! Suddenly you have to differentiate a functional, F{(p) 
i.e. a function which cats a function and outputs a number. The resulting 

Euler-Lagrange equation 

is usually a differential equation for (j)^. If <pa itself is a function of several vari- 
ables then it is likely that this Euler-Lagrange equation is a partial differential 
equation. 

The physical and geometric problems we are interested in are very natural, for 
example: 

• What is the shortest path between two points in the plane? (i.e. minimise 
the length functional over all paths). 

• What is the shape in the plane with maximal area amongst shapes of a 
fixed perimeter? (i.e. maximise area bounded under the constraint of 

fixed perimeter). 

• What is the function </>: J7 ^ R (i7 C R^) which minimises the surface 
area of its graph subject to the boundary condition that is some fixed 
function? 

The first two give rise to ordinary differential equations (for example, the second 
becomes a harmonic oscillator equation for the (derivatives of the) components 
of the parametric curve bounding the shape). The last gives a quasilinear sec- 
ond order partial differential equation. I assume that you know how to solve 
harmonic oscillators and other ordinary differential equations, however... 

...The second part of the course will develop techniques to solve some partial 
differential equations. We begin by treating first order equations (only involving 
first partial derivatives). These can be tackled by a technique called the method 
of characteristics which originates in geometric optics. In optics with the speed 
of light set equal to 1, there is a first order PDE called the eikonal equation 



dx J \dy 

satisfied by the function <p{x, y) whose value at (x, y) G R^ is the length of time 
it take light to travel to (a;, y) from some fixed light-emitting curve C C R^ (the 
light being emitted in a normal direction to C) . The obvious way to solve this is 
to draw a straight line connecting C to (x, y) and meeting C at ninety degrees. 
Let t be the parameter for this line (x(t),y(t)) and observe that if we define 
f{t) = (j){x{t),y{t)) then f{t) satisfies the ODE df/dt = 1 (simply because it 
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takes light time t to travel t units because the speed of light is 1). I'm not 
saying that this is obvious from the eikonal equation, but from the physical 
interpretation of the solution (j) it's actually a tautology. 

The method of characteristics generalises this example and constructs, for any 

first order PDE, a system of "characteristic curves" (in this example the light 
rays emitted normally from C) along which the PDE reduces to an ODE. 

Finally we move on to tackle second order equations. We will concentrate on 
three which arise very naturally in physics and geometry: 

• The heat equation 

'di 




• Laplace's equation 



• The wave equation 



— - H = 0. 



These are linear and therefore highly amenable to solution. They also typify 
three different classes of second order equations with vastly different behaviours: 
parabolic, elliptic and hyperbolic. 

The general strategy of this final part of the course will be the following: 

• Find infinitely many special solutions to the equation. Maybe they don't 
satisfy the boundary conditions you're interested in, but don't worry. This 
step uses a clever idea called separation of variables: you assume the solu- 
tion has the form (t){x,y) = X{x)Y{y), that is a product of two functions 
each depending on only one of the variables. Then the PDE reduces to a 
pair of ODEs which are easy to solve. Of course, separation of variables 
rarely works, but with these simple linear equations it is fiercely effective. 

• Take infinite linear combinations of these special solutions to fit to your 
boundary conditions. By linearity of the PDE one c;an take linear combi- 
nations of solutions and obtain a function which still solves the equation. 
Fitting infinite sums of special functions (like sine and cosine) to arbitrary 
functions is the subject of Fourier theory and indeed this is how Fourier 
discovered Fourier theory. 

There are also several appendices: the first is a recap of Fourier theory because 
we will use lots of Fourier series in the second half of the course; the second 
includes muc;li of the Sage code I used to c;reate the diagrams (I also used 
Inkscape); the other appendices comprise all the problem sheets for the course. 
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Videos 

There are supplementary videos for this course, available on YouTube. These 
cover a mixture of basic material prerequisite for understanding the course and 
extension material for those who care to watch it. Links to these videos are in 
the relevant places in the notes. 



What you should already know 

You should be: 

• Comfortable computing Fourier series and manipulating trigonometric ex- 
pressions. If you're struggling, try working through the Schaum Outline 
Series volume on Fourier Analysis. 

• Able to solve a wide variety of ordinary differential equations. At the very 
least the simple harmonic equation X" = XX\ If you're strugghng, try 
working through the Schaum Outline Series volume on Ordinary Differ- 
ential Equations. 

• Able to remember and apply Green's theorem and have a rudimentary 
knowledge of vector calculus (div, grad and curl, tangent vectors to curves 
etc.). 

• Ready to try solving a problem for which you have not been given a precise 
template (most important). 



Notation 

Sections, like this one, which are marked with a left bax contain material I 

consider to be nonexaminable, usually proofs. That doesn't mean you should 
ignore them, just that you shouldn't sit there memorising them getting your 
knickers into a twist. 

Partial derivatives are written with curly dees Ordinary derivatives are 
written with Latin dees The only difference is that jjj means the function 
being differentiated is a function of a single variable. People who write partial 
derivatives with Latin dees are wrong and will be penalised. People who write 
ordinary derivatives with curly dees on the basis that single variable is a special 
case of several variables are smart, but still wrong, because it's misleading for 
the reader. If ever I write d/dt instead of d/dt and you don't know why I did 
it, challenge me (there's always the possibility that I made a mistake). 

Also, if x(t) is a function of time t, the notation x indicates differentiation with 
respect to t. If y{x) is a function of x, the notation y' indicates differentiation 



CONTENTS 



11 



with respect to x. It's possible that ' might denote differentiation with respect 
to another variable like t. It shouldn't bother you because it will only ever occur 
when the function in question is a function of a single variable, so x' or x just 
mean "differentiate x with respect to whatever it depends on". 



Acknowlegdements 

This course owes a lot to the course taught in previous years by Dr R Bowles and 
Dr G Esler. Their notes have been indispensible in preparing these notes; many 
of the nice questions from their problem sets have made it onto these problem 
sets. Arnold's book "Lectures on Partial Differential Equations" provided most 
of the inspiration for my presentation of the method of characteristics; Spiegel's 
"Fourier Analysis" has been useful too: I learned Fourier series from it myself, 
long ago. Thanks most of all to my extremely diligent markers (Giancaxlo 
Grasso, Abbygail Shaw and Huda Ramli) and my wonderful and inquisitive 
second years for pointing out so many errors and typos in the lectures, in the 
notes and on the question sheets. 
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Chapter 1 



Calculus 
variables 



with several 



For most of what I say. we'll take "several" to mean "two" to save on notation. 
There are no new ideas introduced by adding more variables. Henceforth we'll 
use {x,y) to denote Cartesian coordinates on the plane R^. 



1.1 First derivatives 
1.1.1 Pcirtial derivatives 

Given a function / : — > R, its partial derivatives are what we get by differ- 
entiating with respect to one of the variables x or y and keeping the other fixed. 
In other words, to get the partial derivative with respect to x, we restrict / to 
the horizontal lines y = yo {yo is some constant) and we get a function of one 
variable 



which we know how to differentiate (assuming it's differentiable!). We define 
the first partial derivative of / with respect to x at {xo,yo) as 



f{x,yo) 



df 



{xo,yo) = 



dx 



f{x,yo) 



dx 



X=Xq 



= lim 



f{xo + h,yo) - f{xo,yo) 
h 
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Similarly 



df, , d 



lim 

fc^o 



y=ya 

f{xo,yo + fc) - f{xo,yo) 
k 



Example 1. 1. If f{x,y) = x^y^ then = 2xy^ and |^ = 3x'^y'^. 



2- If f{x, y) = sin(x + y) then §^ = cos(x + y)= 



We will sometimes abbreviate to d^f or just fx- Often you will see it written 
f^x to distinguish the subscript from an ordinary subscript (like a vector index) . 



1.1.2 Directional derivative 

Why should we restrict attention to the lines x = xq or y = y^l Let's pick a 
vector V E and look at all lines pointing in the direction v, i.e. all lines of 

the form 

{p + tv : ieR}. 

Suppose we want to compute the derivative of / at p = (xq, yo) in the direction 

of V = (f 1, V2). Restrict / to the line (xq + tvi, j/q + ^^^2) ^nd differentiate with 
respect to t to get what we call the directional derivative of / in the direction 
V at the point p: 

^ f{xo + tvi,yo + tv2) 

t=0 



- dt 



Certainly if v = (1,0) then we get 

df 



or if 1! = (0, 1) then we get 



Mf) = ^Jp) 



Mf) = %iP)- 



Theorem 2. Suppose that the partial derivatives of f with respeet to x and y 
exist everywhere and that they vary continuously. Let v — {vi,V2) be a vector 
and p G'R? be a point. Then 



df df 



This theorem is a nice piece of analysis and therefore we won't prove it. You 
can tell it's nontrivial because it's not immediately obvious what continuity of 
the partial derivatives has to do with anything. 
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The directional derivatives are just linear combinations of the directional deriva- 
tives in the x- and y-directions. What does this mean geometrically, in terms 
of the graph of /? 

In single-variable calculus, vectors in R have just one component v. The 

directional derivative of a function .g : R R along the vector v at x is 
= vg'{x). As V varies this traces out a line with slope g'{x), which is 
tangent to the graph of g when we translate it to {x,g{x)). 

In two variables, the directional derivative of /: R^ — > R^ in the direction 
V = {vi,V2) at jj = (a;o,yo) is 

which traces out a plane as I'l and V2 vary. This plane is tangent to the graph 
of / when it is translated to the point {xo,yo, f{xo,yo))- In other words, the 
geometric content of the above theorem is that the graph of a function with 
continuous partial derivatives admits a tangent plane. 

Example 3. You cannot drop the assumption of continuity of partial deriva- 
tives. A counterexample would be the function 



f{x,y) 



^ when {x,y)j^{0,0) 
at {x,y) = (0,0) 



Both partial derivatives exist everywhere. They vanish at the origin, but for 
example 

df _ j,2y 



2^2 



dx (a;2 + y 

and along the y-axis this is y'' /y"^ = 1/y which does not tend to zero as y — » 
0, hence the partial derivatives are not continuous. The function is not even 
continuous when restricted to a line which is not a; = or y = and hence the 
other directional derivatives don't even exist. 



To get away from difficulties like this, we will only discuss functions with con- 
tinuous first partial derivatives, also called C^-functions, or once continuously 
differentiable functions. 

In this case we can define total derivative at p 

dp f '. R"^ — > R 

which encodes all directional derivatives. The total derivative eats a vector v 
and outputs the directional derivative in the w-direction at p, that is 



dpf{v) = Vp{f). 
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Figure 1.1: The graph of the function from Example |3] which has discontinuous 
partial derivatives and hence no tangent plane at the origin. 

By Theorem |2] at each point this is a linear map 

so we can write d/ as a row-vector (also known as a covector) 

df = ( dj dyf ) . 

1.1.3 Linear approximation 

The point of introducing the total derivative dpf : ^ R is that it is a good 
linear (or "first order" ) approximation to the original function / : R^ — > R at 
the point p. 
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We first recall the situation in one dimension. Let g: R ^ R be a difFerentiable 
function of one variable. By the definition of the derivative we have the following 
fact 

9{xo + e) = g{xQ) + eg'{xo) + er]{e) 

where 7y(e) ^ as e 0. In other words, g{xo)+eg' (xq) is a good approximation 
oi g{xo + £) for small e. This is actually equivalent to the usual definition because 
by rearranging we have 

g{xo + e) - g{xo) , , , , - 
9 [xq) + ri{t) 

Let /: R^ ^ R be a function with continuous partial derivatives. Let {vi,V2) 
be a vector and p = (xo,yo) ^ point. Since the above approximation holds in 
all directions and using Theorem [2] to express the directional derivative in the 
(ui, W2)-direction in terms of the partial derivatives, we get 

f{xQ + vi,yo + V2) = f{xo,yo) + vida:f{xo,yQ) + V2dyf{xo,yo) + \v\r]{\v\) 

where //(e) — s- as e ^ 0. In other words, 

f{xo + vi,yo + V2) = f{xo, yo) + dpf{v) + |w|77(|w|). 

Remark 4. The total derivative dpf is a good approximation to f at p in the 
same way that the tangent plane to the graph Graph(/) of f at (xq, ?/oi fi^o^ yo)) 
in R"^ stays close to the graph itself in a neighbourhood of p. In fact, the 
tangent space at p to Graph(/) is the image dpfiR'^), translated to the point 
{xo,yo, f{xo,yo))■ 



1.1.4 The gradient 

Another, equivalent way of packaging the information in the total derivative is 
to give the gradient V/ of the function /. This is the column vector 

(with more vertical entries if there are more coordinates) . Note that by defini- 
tion: 

rfp/H-(V/)(p)-«. 

Lemma 5. The gradient (V/)(p) points in the direction of maximal increase 
of f at p. Moreover, its magnitude equals the directional derivative in the cor- 
responding unit direction (V/)(p)/|(V/)(p)|. 
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Figure 1.2: The graph of the total derivative dpf translated to the point (p, f{p)) 
is the tangent plane to Graph(/) at {p,f{p))- 

Proof. Let's assume that p is not a critical point of /, otherwise the Lemma 
is obvious (because both dpf and (V/)(p) vanish). If w is a unit vector then 
\dpf{v)\ = |(V/)(p) -v] = \{V f){p)\\v\ cos{9) where 9 is the angle between v and 
(V/)(p). This is clearly maximal when cos(6') = 1, that is when 9 = 0. The 
directional derivative of / in the (V/)(p)/|(V/)(p) [-direction is 

□ 



1.1.5 The matrix of partial derivatives 

In general for a vector- valued function F = (Fi, . . . , Fm) ■ R" R™ there are 
n X m derivatives which we write in an m-hy-n matrix 
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In particular, for a change of variables F: R^, 

F{x,y) = {u{x,y),v{x,y)) 
we define the Jacobian matrix 

( du du \ 
dx dy 
dx dy ) 

The naturality of writing this as a matrix comes from the following theorem. 

Theorem 6 (Chain rule). //F: R" R™ andG: R™ ^ R^ are dijferentiable 
functions then 

dp{G o F) = dp(^p-jG o dpF 

where o denotes composition of functions on the left-hand side and matrix mul- 
tiplication on the right. 

We first make explicit note of a useful special case. 

Corollary 7. Suppose that F: R^ — > R^ is a change of coordinates, {u,v) = 
F{x,y), and G : R^ R is a function. The function G o F expresses G{u,v) 
in terms of the coordinates {x, y) and we have 



du du 

dx dy 

dv dv 

dx dy 



( dG dG \ _ I dG dG \ 
y dx dy ) ^ \ du dv ) I 

. dG_dGdu dGdv 

dx du dx dv dx ' 
dG _ dGdu ^ dGdv 

dy du dy dv dy 



Proof of Theorem^ See jhttp : //youtu . be/BtgBCPdT8rM| Let v be a vector 
and e a parameter which we assume takes on small values. We have 

F{p + ev) = F{p) + edpF{v) + e\v\r]{e\v\) 

and 

G(u + ew) = G(u) + ed^G{w) + e\w\^{e\w\) 

for some functions ?7,^: R — > R™ such that |?7(r)| and |^(r)| tend to zero as 
r ^ 0. Therefore 

G{F{p + ev)) = G{F{x) + edpF{v) + e|w|77(e|t-|)) 

so if we set u — F{p) and w — dpF{v) + |w|?7(e|u|), we get 

G{F{p + ev)) = G{F{p)) + edp^p^G o {dpF{v) + \v\v{e\v\)) + e\w\ae\w\) 
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which equals 

G{Fip)) + edp^p)G o dpFiv) + e {dp^p^Gi\v\rj{e\v\)) + \w\^{e\w\)) 

by linearity of dF{p)G. The term 

dp(^)G{\vHe\v\)) + \wmw\) 

goes to zero as e ^ and since this linear approximation determines our first 
derivative we know that 

dp{G o F) = dF(p)G o dpF 

□ 

In case you find the level of abstraction in this proof intimidating, I suggest 

you try to prove the corollary explicitly in coordinates as it is stated. In case 
you find the proof fun, I suggest you go to Analysis 4 to see more where that 
came from. 

1.1.6 Critical points 

We are interested in maxima and minima of func;tions. I just want to recall: 

Definition 8. A function f has a local maximum (respectively local minimum^ 
at p if there is a neighbourhood of p such that f{p) > f{x) (respectively f{p) < 
f{x)) for all X in the neighbourhood. 

In one variable, we have the following theorem: 

Theorem 9. Suppose that g : R ^ R i.s a differentiate function with a local 
maximum or a local minimum at xq e R. Then g'{xo) = 0. 

Geometrically, you drop a horizontal line down onto the graph of / until it hits. 
The point where it hits is clearly a local maximum and the horizontal line is 
tangent there. 

Proof. Let's prove it for local maxima. Since g is differentiable it can be ap- 
proximated by 

g{xo + e) = g{xo) + eg'{xo) + er?(e) 

where 77 — > as e ^ 0. If g'{xQ) ^ then it has a sign, ±. We consider small e 
with the same sign. When the magnitude of e is sufficiently small, 77(e) can be 
made arbitrarily small relative to g'{xQ) and hence the error term er]{e) can be 
made arbitrarily smaller in magnitude than eg'{xQ). 

In particular, if 

g{xQ) + tg'{xQ) > g{xo) 
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Figure 1.3: A function with a local maximum. Note that the tangent plane to 
the graph at the local maximum is horizontal. 



for very small e then 

9{xo + e) 



g{xQ) + eg'{xo) + e-q{e) > g{xo) 



because the error term is not big enough to change the inequality. Because we 
took e to have the same sign as g'^xo) we certainly have g{xo) +eg'(xo) > g{xQ) 
and hence this is a contradiction to local maximality of g ai xq. □ 

In two variables, the geometric idea is the same: you drop a horizontal plane 



down onto the graph and when it hits it is tangent (see Figure 1.3 1 



Theorem 10. // /: — > R zs a C^-function with a local maximum or local 
minimum at p = (xo,yo) then all directional derivatives vanish at p, in other 
words dpf — 0. 

Proof. Suppose / has a local maximum or minimum at p. Let {{xq + tvi,yQ + 
tv2) '. t G R} be the line through p in the direction v — (f i, U2). Restrict / to 
this line; it still has a maximum or minimum at t = 0. Hence, by Theorem [9j 

f{xo + tvi,yQ + tv2) = Q 



dt 



but this is precisely Vp{f). 



□ 
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Definition 11 (Critical point). Let f : H be a C^-function. A point 

p G with dpf — is called a critical point. Equivalently, all directional 
derivatives of f vanish at p, or the tangent plane of the graph of f at p is 
horizontal. 

Not all critical points are maxima and minima. Even in one variable we have 
inflection points. The analogous critical points in two variables are the saddle 
points, see Figure |1.4[ 



1.2 Second derivatives 

In one variable there is a sufficient condition for a critial point to be a local 
maximum (or a local minimum) in terms of its second derivatives. This is 
called the second derivative test and is proved using Taylor's theorem: 

Theorem 12 (Taylor's theorem). Suppose that g: R R is a twice differen- 
tiable function with derivatives g' and g" . Then 

g{xo + e) 9{xo) + eg'ixo) + —g"{xo) + e^77(e) 
for some function rj which tends to zero as e — s- 0. 

Corollary 13 (Second derivative test). Suppose that R — s- R is a differ- 
entiable function and that g has a critical point at Xq. Suppose moreover that 
g"{xo) < 0. Then g has a local maximum at xq. 

Proof. From Taylor's Theorem: 

g{xo + e) = 5(2:0) + eg'{xo) + ^e'^g'^xo) + e'^T]{e) 

= gixo) + \e^9"{xo) + e'?7(e) 

since g'{xo) = by the condition that xq is critical. Now by taking e small 
enough, we can ensure that |77(e)| < i|g"(xo)| so if g"{xo) < then g"{xo) + 
77(e) < and hence 

g{xa + e) < g{xo) 

for all small e. Hence g has a local maximum a,t xq. □ 

We will now develop the formalism of second partial derivatives in two vari- 
able calculus, prove the analogue of Taylor's theorem and explain the second 
derivative test for critical points in this context. 
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1.2.1 Definition 

The second partial derivative 

of a function /: R-^ ^ R is defined by differentiating tlie function ^ with 
respect to x, keeping y fixed. Similarly one can define partial derivatives 

92/ 52/ 



dxdy ' dydx ' dy'^ 

As before, to avoid analytical subtleties, we will assume that all partial deriva- 
tives exist and arc contimious. a property which we will call C^. With this 

assumption wc have the following 

Theorem 14 (Symmetry of mixed partial derivatives) . // / : R'^ ^ R is a 
-function then 

dxdy dydx ' 

Given the definition 



f{xo + tv-i,yo + tv2) 



of the directional derivative in the direction v = {vi,V2) at p = {xo,yo), the 
definition of a "second directional derivative" should be clear; 



f{xo + tvi,yo + tv2). 



t=0 

Equivalently, consider the directional derivative Vp{f) as a function of 

t;.(/):R2^R 
and take its directional derivative in the v direction: 

Henceforth we'll omit the subscript p, so this can be written as 

Of course you could also differentiate v(f) in the lu-direction for a different 
vector w. Taking v and w to be standard basis vectors, you end up with the 
partial derivatives 

d^ 

dx'^ ' dxdy ' dy"^ 
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Lemma 15. If v = {vi,V2) then 



,2^ 



dxdy 



2^ 



Proof. We know that 



v{f) = {vidx + V2dy)f 



so 



<v{f)) 



{video + V2dy){vidx + V2dy)f 




by the binomial theorem. 



□ 



The generalisation to higher derivatives can be derived similarly and involves 
binomial coefficients. 

1.2.2 The Hessian and Taylor's theorem 

The expression 



The two-by-two matrix is called the Hessian of f at p, HesSp(/). The quadratic 
form in the coefficients of v is f^HcsSp(/)u. 

Theorem 16. Suppose that /: ^ R is a -function. Then 



for some function r]{r) which tends to zero as r — > 0. 
1.2.3 The second derivative test 

Theorem 17. Suppose that f: R^ ^ R is a -function and suppose more- 
over that p is a critical point of f, i.e. dpf = 0. // both eigenvalues of the 
matrix Hcss(/) are negative (respectively positive) then p is a local maximum 
(respectively local minimum). 



is a quadratic form in the coefficients of v. We can write it as 




f{p + v) = f{p) + dpfiv) + ^v^ Hess(/)^; + \v\t,{\v\) 
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Proof. First notice that HcsSp(/) is a symmetric matrix. Therefore its eigen- 
values are both real, say Ai,A2 S R. Suppose they are both negative. We can 
diagonalise HesSp(/) by an orthogonal matrix U: 





' ) 




0^ 


A2 J 


I) 



Let ei = (1,0) and 62 = (0,1) denote the basis vectors. Then Uei is a Ai- 
eigenvector for HesSp(/). Taking the Taylor expansion we get 

f{p + eUei) = f{p) + edpfiUei) + ^ef C/^HesSp(/)C/ei + e7?(e) 

Since p is a critical point, the first derivative term vanishes and we get 

f{p + eUei) = f{p) + ^eJU^ HesSp(/)f/ei + e7?(e) 

Using the fact that U'^ = (orthogonaHty of U), the second derivative term 
is 

^ef;7^HesSp(/)[/ei = ^( 1 
<0 

and similarly for A2. The error term can be made very small when e is very small, 
so that f{p + eei) < f{p). Similarly for 62 and indeed for any v = Viei + ^262 
we get 

f{p + vid + ^262) = f{p) + ^(Ait^i + H 

which is < f{p) when \v\ is very small because Ai,A2 are both negative. This 
proves that p is a local maximum and the same proof would demonstrate a local 
minimum in case both eigenvalues were positive. □ 

Corollary 18. Suppose that f : R is a C"^ -function and suppose moreover 

that p is a critical point of f. If 

det(HesSp(/)) > 

then p is either a local maximum or a local minimum of f. If moreover 

Trace(HesSp(/)) < 0, (respectively > 0) 
then p is a local maximum (respectively local minimum). 
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Proof. Since HesSp(/) can be diagonalised by an orthogonal matrix U, 

det(HesSp(/)) = det (u ^ U^^ 

^det{U)XiX2 det(C/'^) 

since det(?7) = det([/'^) = ±1. 

If det(HesSp(/)) > this means that Ai and A2 are nonzero and have the same 



sign. Hence Theorem 17 apphes 



Of course, we cannot tell whether Ai and A2 are both positive or both negative 
without further information and hence we don't know whether p is a maximum 
or a minimum. In the case when Ai and A2 have the same sign, it suffices to 
compute the trace of HesSp(/), since this is equal to Ai + A2 which has the same 
sign as both Ai and A2. This proves the theorem. □ 

Remark 19. When there are eigenvalues equal to zero, the Taylor series ar- 
gument doesn't work because the error term dominates the second order terms 
in certain directions. We call such critical points degenerate. They are more 
complicated. 

Definition 20. A critical point p of a function f : —fHis called nondegen- 
erate if all its eigenvalues are nonzero. Equivalently, i/det(HesSp(/)) 7^ 0. 

So what do things look like at a nondegenerate critical point if the determinant 
of the Hessian at a critical point is negative? That is, if one of the eigenvalues 
is positive and the other is negative? This is called a saddle point. 

Example 21 (A saddle point). Consider the function 

f{x,y) - y^. 
The origin is a critical point and the Hessian is 



1 

-1 



Hesso(/) = 

with eigenvalues ±1. We can see from Figure \T4\ that the function has a maxi- 
mum along the y-direction and a minimum along the x-direction. 



1.2.4 Higher dimensions 



See http : //youtu . be/CNZDEPGKzcA| 



In higher dimensions the Hessian is a larger matrix, but symmetry of partial 
derivatives means that it is still symmetric. More complicated nondegenerate 
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Figure 1.4: A function with a saddle point. 



critical points are possible, but because we can always diagonalise a symmetric 
matrix, these critical points will always look like one of a finite list of models 
depending on how many positive and negative eigenvalues the Hessian has. 
These models are 

m n 
k—1 k—ni+1 

and are known as quadratic or Morse singularities. 

Note that the diagonalisation argument still works: if p is a critical point of 
/ and Ml, . . . ,w„ is an orthonormal basis of eigenvectors for HesSp(/) and v — 
ViUi + • • • + VnU+n is a vector then the Taylor expansion is 

f{P + W) = fip) + liMvl + ■■■ + Xnvl) + ■■■ , 

so maxima are still points where all Afe are negative and minima arc still points 
where all are positive. 
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1.3 Constrained optimisation 



1.3.1 The geometric idea... 

We are often interested in solving a maximisation/minimisation problem with 
an extra constraint. 

Example 22. • How close does the ellipse j c? + l\? = 1 get to the 

point p? This is quite easy: just parametrise the ellipse as x — acos(i), 
y — bsm(t), write the distance between (x(t),y(t)) and p as a function of 
t and find and analyse the stationary points. 

• How close does the surface x'^ + y^ + — 3xyz get to the point (0, 0, 1) ? 
It's not so easy to find a convenient parametrisation of this cubic surface! 

For the sake of clarity we restrict attention to functions /: ^ R of two 
variables and, given a curve C C R^ we restrict the function f to C and obtain 
a new constrained function h = f\c- The question we want to answer is: what 
are the critical points of h? 



There are two ways in which this can happen (see Figure 1.5 1 



It is possible that a critical point of / happens to lie on C. For example, 
take f{x, y) = x^ — y^ and the curve C — {x = 0}. The restriction of / to 
C is the function h{y) = —y^ (where y is the remaining coordinate on C). 
This curve passes through the saddle point at (0, 0) and h has a maximum 
there. 

It is also possible that / has no critical points and yet f\c has some. For 
instance, take /(x, y) = x: since df jdx — 1 this has no critical points. But 
if we restrict to the circle C = {x^ + = 1} then certainly h(d) = cos(6') 
has a maximum at = and a minimum at = tt (where 6 is the angular 
coordinate on the circle). Notice that these two critical points occur at 
the points where the level sets of / (the vertical lines) are tangent to the 
circle C. This is no coincidence... 



Theorem 23 (See Figure 1.5 1. Let /: R^ — > R &e a -function and C C R^ 
a curve in the plane. Let h denote the restriction f\c- Then h has a critical 
point at p Cz C if and only if either 

• / has a critical point at p or 

• the level set f~^{f{p)) and the level set C have the same tangent line at 
P- 

I should add the caveat that we need the C to be a nice smooth curve. In par- 
ticular, we require that near every point p G C there is a C^-parametrisation 
7: (— T, T) — > C with 7(0) — p and 7(0) ^ 0. Some day you will see a the- 
orem (maybe you already have) called the implicit function theorem which 
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x = 



Figure 1.5: Examples illustrating Theorem 23 In one case (left), C runs through 



a critical point of / and inherits a critical point itself. In the other (right), the 
dashed contours of / are tangent to C at the two marked points where h = f\c 
has a minimum and a maximum 



tells you that this parametrisation exists provided that C can be written as 
g^^{0) for a function g which has no critical points on the level set g~^{0). 
Crucially, we don't need to know the parametrisations to apply the theorem, 
we just need to know they exist. Similarly, the level sets of / admit local 
parametrisations near p provided p is not a critical point of /. 

We start with a lemma to help us understand tangent lines to level sets: 

Lemma 24. // /: — > R is a -function and f^^{r) is one of its level 
sets (assumed to contain no critical points) then the tangent line to f^^{r) 
at p & f~^{r) is precisely the set of vectors v such that dpf{v) = 0. 

Proof of Lemma \24\ Take a parametrisation 7: {—T,T) f~^{f) with 
7(0) = p and 7 7^ 0. The tangent vectors we are looking for are precisely 
multiples of 7(0) so we want to show that these are annihilated by dpf . 




We know that 

/o 7 EE r 

because the image of 7 is (by definition) inside the level set f~^{r). Therefore 
doif o 7) = 0. By the chain rule we get 

= rfo(/o7) = rf7(o)/(7(0)). 
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So dpf annihilates 7(0), as required. 

We also want to show that the kernel is precisely the set of vectors A7(0). 
Since p is not a critical point of / we know that there exists a vector v with 
dpf{v) 7^ 0. If tt is any vector then u — + jJ-v for some A,/Lt G R and 

hence 

dpf{u) = ^i. 

This proves that if dpfiu) = then /i = and m is a multiple of 7(0). □ 



Proof of Theorem\2^ Take p e C and let 7: {—T,T) ^ C be a parametri- 
sation of a neighbourhood of p in the curve C. We are asking when p is a 
critical point of F, i.e. when s = is a critical point of s 1-^ /(7(s))- But 

/(7(*)) ~ (/ ° "/){^) ^^^d by the chain rule 

rfo(/o7) = dp/(7(0)). 

This vanishes if and only if the direction 7(0) (which is tangent to C) is 
annihilated by dpf. This certainly happens if p is a critical point of /, which 
is the first case of the theorem. If we assume that p is not a critical point of 
/ then dpf{'j{0)) = imphes that 7(0) is tangent to the level set of / passing 



through p, by Lemma 24 □ 



1.3.2 ...and in practice 

In practice we write C = {g — 0} for some function g. By Lemma [24] the 
tangent line to a level set of / at p is the kernel of the linear map dpf (and 
similarly for g). If p is a critical point of h — /|g-i(o) then, by Theorem 23 the 
level sets of / and g share a tangent line. That means that the two linear maps 
dpf and dpg have the same kernel and hence they are proportional: 

dpf = Xdpg for some A. 

We take A as a new variable (the Lagrange multiplier) and consider the function 

H{x, y, A) = /(x, y) - Xg{x, y). 

Varying with respect to x and y we get critical points when 

dpf - Xdpg = 0, 

i.e. when dpf and dpg are proportional. Varying with respect to A, the critical 
point condition is 

^ = -.9(-,y) = o. 

In other words, critical points of H are triples (x, y, A) such that 
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Figure 1.6: The setup for Example [25 



• 9{x,y) = 0, 

• (x, y) is a critical point of F — /|g-i(o)- 

Example 25. How close does the circle -\-y^ = 1 get to the point {p, q) ? We 
know the answer: just rescale (p, q) to have length one and you'll find the closest 
point to {p,q) on the circle. But for practice, let's work it out using Lagrange 
multipliers. Let g{x,y) = + y^ — 1 and let f{x,y) ^ {x — p)^ + {y — q)^ , which 
measures the (squared) distance to {p,q). In order to minimise f over g^^(0), 
we introduce a Lagrange multiplier A and seek critical points of 

H{x, y, A) = (x - pf + {y- qf + \{x^ + y^ ~ I) 

Computing all the first derivatives we get 

2{x-p) + 2\x = Q 
2{y -q) + 2\y^0 
x^ + y^ = l 

p q 



which gives 



1 + A' ^ 1 + A' 

2 „2 



P 



or 



Substituting back gives 



as we suspected. 



(1 + A)2 (1 + A)2 
A = + 9' - 1 



{x,y) = -^={p,q) 



/p,q 



= 1 



Example 26. Suppose you have enough money to buy four square metres of 
cardboard and you want to use it to make a lidless box. However, you're allowed 
to cut the cardboard into the desired shape before you buy it. How should you cut 
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it to maximise the volume of the box? Well you can control the three dimensions 
of the box a,b,c and let's suppose that the lid would have area be. The total 
surface area is therefore 

2ab + 2ac + be 

and the volume is 

f{a, b, c) = abe 

Take g{a, b, c) = 2ab + 2ac +bc — A and introduce a Lagrange multiplier A. We 
need to minimise 

H{a, b, e, A) = abe - X{2ab + 2ac + 6c - 4) 
with respect to the four variables a, b, e, A. 

First, because the situation is symmetric with respect to b and e, we expect that 
the minimising configuration will have b = c. To prove this, we differentiate 
with respect to b and c and we see that, at a critical point, 

dH 

= — = ac - A(2a + c) 

ob 

dH 

Q = — =ab- A(2o + b) 
oc 

This gives 

(o - A)c = 2\a ={a- X)b 

so b = c unless a = X. If a = X then ac — X{2a + c) = 2o^ = so a = 0, but we 
are assuming that our box has some height. Therefore we can assume b = c. 

The problem is now to minimise 

ab^ ^ X{4ab + 6^-4) 

and the vanishing of partial derivatives with respect to a, b and X gives 

62 - 4A6 = 
2a6 - 4Aa - 2A6 = 
Aab + 6^ = 4 

Since b^O, the first and third of these equations become 

6 = 4A 

4-62 

1-4A2 



and hence the second becomes 
Hence the solution is 



4A 

1 = 12A2 
a=l/V3, 6=4/\/3. 



Chapter 2 



Calculus of variations 



Up to now we have been interested in a very limited class of optimisation prob- 
lems: allowing ourselves a finite set of variables (usually one, two or three) and 
maximising a function over these. Things become much more interesting when 
we have an infinite amount of freedom to vary. For example, we might be in- 
terested in all paths betwen two points, or all surfaces in space with a given 
boundary curve, and we might be interested in minimising length, respectively 
area. We will first deal with a classic example: proving that a line between two 
points in the plane is the shortest path joining them. Then we will move on to 
the more general theory, illustrating it with a plethora of examples. 

A very nice introduction to these ideas is provided by the |Feynman lecture on 
the principle of least action 



2.1 Straight lines are shortest paths 



I want to convince you that a straight line is the shortest path between two 
points in the plane. Maybe you don't need much convincing of this fact, but 
did you ever really think about what this means? I'm saying that if you fix 
two points and consider the space of all possible paths between them (which 
is of course an incomprehensibly large, in fact infinite dimensional, space) you 
can define a functional on this infinite-dimensional space of paths which takes 
each path to its length and this functional has a global minimum at the path 
given by a straight line. It suddenly seems like a more intimidating task to 
really convince ourselves that this is true, and it seems more surprising that our 
human intuition picked up on this without our noticing. 
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2.1.1 Paths and length 



See http : / /youtu . be/bM_klC-o Azg| It seems like the key step in proving the 



statement is in understanding what we mean by a path. Let us fix two points A 
and B in the plane with coordinates {Ai,A2) and B2)- By a path between 
them we mean a map 

7: [0,1] 

such that 7(0) = A and 7(1) = B. Alternatively, we can project ^{t) to the 
two coordinate axes and think of it as a pair of functions 7(t) = {ji{t),j2{t)) 
satisfying 71 (0) = Ai, 71(1) = etc. Now we don't want our paths to jump 
around discontinuously in the plane so we'll certainly require the two functions 
7i and 72 to be continuous. In fact, even to define the length of a path we're 
going to need to require slightly more: 

• we need 71 and 72 to be differentiable and for the derivatives to be con- 
tinuous; 

• we also want the derivatives 71 and 72 never vanish simultaneously, in 
other words the vector 7 is never allowed to vanish. 

I'll explain why we need these when we need them. 

When we try and define the length of a curved path, the natural thing to do 
is to zoom in very closely to the curve and recall from Taylor's theorem that 
on very small scales the path is well-approximated by a line. We can imagine 
taking finer and finer polygonal approximations of our path and defining the 
length to be the limit of the lengths of the polygons. This is of course the same 
as defining length by an integral along the curve and the infinitesimal length 
along the curve (the infinitesimal arc-length) is just dt times the length of the 
vector 7 where the dot denotes time-differentiation. Using Pythagoras, this is 



If we wanted to be wholly rigorous we would first show that the definition by 
polygonal approximations gave a well-defined number and then prove that this 
number could be computed as the above integral. Instead, I'll take the shortcut 
of defining the length of a differentiable path to be the above integral. 

Remark 27. Note that here we need the path to he differentiable to even write 
down the integrand. We also need the integral to be well-defined; if the deriva- 
tive of 7 is continuous then so is the integrand and we know how to integrate 
continuous functions on a closed interval (Riemann integration from Analysis 
I/II). 
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2.1.2 Straight lines minimise action 

Actually, the square root really causes us headaches, so I want to get rid of it 
for the moment and define the action of a path to be the integral 

Jo 

We'll prove 

Proposition 28. If is a straight line between A and B and 5 is another path 
between A and B then 

with equality if and only if S = 

In other words, straight lines minimise action. We'll see later how to deduce 
the claim about length from this claim about action. 

Proof Define e{t) = 6{t) - 'y{t). 

a{6)= [\\^ + e\')dt 
Jo 

= [\\i\' + 2^.e+\ef)dt 
Jo 

= a(7) + a(e) + 2 ^ ■ edt 
Jo 

Now because 7 is linear, 7 is constant and hence the final term is 

27 ■ / edt = 2j. [e{t)]l^o 
Jo 

by the fundamental theorem of calculus. But because e(0) = e(l) = (remember 
the endpoints of the path are fixed) this term vanishes. Now we see that 

a{S) = a(7) + a(e) > 0(7) 

with equality if and only if e = 0, if and only if 5 = 7. □ 

The key idea here was to integrate by parts (i.e. apply the fundamental theorem 
of calculus and use the fact that 7 satisfies some highly restrictive equation (7 
constant) to get rid of the term linear in e. Wc were also lucky enough that 
the remaining term was strictly positive, which is what allows us to prove this 
incredibly strong result about global minimation of action by lines in Euclidean 
space. It's not usually the case in such variational arguments that we can control 
the higher-order terms in e. 



38 



CHAPTER 2. CALCULUS OF VARIATIONS 



2.1.3 Straight lines minimise length 

Now we want to argue that straight Unes are in fact shortest. If I give you a 
differentiable path 7 you can reparametrise it. To do this, take your favourite 

difFcrcntiablc, monotonically strictly increasing function ^: [0, 1] — > [0, 1] and 
form the composition (5 = 7 o 0, in other words 



This gives a new path (differentiable by the chain rule!). Reparametrising 
doesn't change the length of a path but it can certainly change the action. 

Proposition 29. Any path can be reparametrised so that its action is equal 
to the square of its length. 

Proof. The parametrisation is defined as follows. 



Jo 

be the arc-length after time t. Provided that 7 is never zero, s is a 

monotonically increasing function of t. 

• We normalise and consider the function (j)(t) = s{t) / L{^) which mea- 
sures what proportion of the arc-length has ben traversed after time 
t. It is easy to see that (i){t) is differentiable since, by the fundamental 
theorem of calculus, its derivative is 



Remember we are assuming that I7I 7^ everywhere. 

• Since 4> is monotonically increasing, it is a bijection and hence it has 

an inverse (fr^ wliic;li takes a input a number A G [0, 1] and outputs the 
time t at which 7 has traversed an arc-length AZ/(7). We will need the 
following fact about (f)~^: 

Lemma 30. //(/>: [0, 1] [0, 1] is a once continuously- differentiable function 
whose derivative is always positive then it admits a continuously- differentiable 
inverse (()~^ whose derivative is equal to 



5{t) = 'y{m 



Let 




(t) = |7W|/L(7)- 




• Therefore (f)~^ gives us a reparametrisation. What does the curve 6 = 
7 o do? At time t = 1/2 it moves to the point on 7 which is exactly 
halfway along (as measured by arc-length). 



2.2. THE EULER-LAGRANGE EQUATIONS 



39 



Now the reparametrised path 5 = ^ o <j) ^ has 



d5 






d4>-'^ 


di 




d4> 


dt 



d(j} / d(j) 
= ''^^^di/lt 
= L(7) 

and hence its action is 

/ \5{t)\^dt= [ L{^fdt 
Jo Jo 

□ 

Now it is clear that the action of a straight Hne is L{'y)'^. This plus the 
previous proposition tells us that the length of a straight line is strictly less 
than the length of any other path joining the points! 

2.2 The Euler-Lagrange equations 
2.2.1 Key steps reviewed 

Let us recall the key steps from the last section: 

• Define some (infinite-dimensional) function space X; in that section it was 
the space of paths connecting A and B. 

• Define a functional F: X ^ R on that space; in that section it was the 
action functional. 

• Take a supposed critical point, 7 S X of the functional; in that section it 

was the straight line segment. 

• Compute + e), where 7 + e is a small variation of 7. Usually you can 
only compute this to first order in e, in other words you can only usually 

compute the quantity we think of as the directional derivative of F in the 
e-direction. We call this the variation in F associated to the variation e 
0/7 (this is where the name variational calculus comes from). 

• Wherever the derivative of e occurs in the resulting integral, integrate by 
parts. 

Let's do this again for the action of a path: 



«(7) = [ {i! + il)dt 

Jo 
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We get 




= a(7) +2 [ (7iei + 7262) dt + 0{e^) 
Jo 

so the directional derivative of a in the e-direction at 7 is 
dja{e) ^ / (7iei + 7262) rfi 

JO 

= [7iei + 72e2]o - / ili^i + 72^2) 
Jo 

where we have integrated by parts to get rid of es. Note that the boundary term 
vanishes because e(0) = e(l) = 0. 

Recall that a critical point is somewhere that all directional derivatives vanish. 
So the condition that 7 is a critical point of a is just 

d^a{e) = for all e 

i.e. 

1 

^ ■ edt — for all e. 
We now state the 

Theorem 31 (Fundamental theorem of the calculus of variations). Suppose 
that y: [0,1] R" is a vector-valued function. If Jq y{t) ■ e{t)dt — for all 
-functions ("variations") e: [0, 1] — > R" then y = 0. 

In particular this implies that 7 = 0, in other words the components of 7 have 
to be linear function of t. In other words, 7 is a linearly parametrised straight 
line. So even if we hadn't known the answer was a line to begin with, we could 
have worked it out (provided we could solve the differential equation 7 = 0). 

Remark 32. Notice that the equation we obtained above is a second-order equa- 
tion (7 = 0). This will hold in general because of the step where we integrate 
by parts. This means that we should always assume our function 7 is twice 
continuously differentiable (C"^) instead of just . 

2.2.2 The fundamental theorem of the calculus of varia- 
tions 




In this section we will prove Theorem 
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Proof. Suppose, to the contrary, that there is a € [0, 1] with y{to) ^ 0. We 
may as well assume that the component yi{to) > 0. Because yi{to) > 0, we 
know that t/i {t) > for all t in some small interval {to — S,to + d). 

Define a "bump function" /: [0, 1] — > R which is 

• c\ 

• nonnegative everywhere and positive at to, 

• and vanishes outside the interval {to — 6,to + 6). 
Such functions are quite easy to construct. For instance 

exp (^ (t_t^)2_j2 ) if t e (to - S, to + 5) 
otherwise 



m = 



will do! 

Now consider the function e: [0, 1] R" given by 

e{t) = {f{t), 0,...,0) 
Integrating this against y gives 

/ y{t)-e{t)dt= r f{t)yi{t)dt>0 

Jo Jto-S 

which contradicts the assumption that y{t) ■ e{t)dt = for all e. □ 
2.2.3 The Euler-Lagrange equation 

Suppose we are interested in a situation where the space X is a space of C^- 
functions (j): [a,b] ^ R of one variable, t. Suppose moreover that the values of 
(j) are specified at the endpoints of the interval [a,b], 

(t>{a) = A, (j){b) = B. 

The other functions in the space X can be written 

(l) + e 

for some variation e: [a, &] — > R with e(a) = e{b) = 0. 
Let -L be a function of three variables 

L{p,q,r) 

and suppose that the functional we want to minimise is 

b 



F{4>)= / L(t,(t>{t)A{t))dt, 
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in other words the integrand depends only on t, cj) and its first derivative. For 
instance, the integrand in the length and area functionals above only depended 
on (j). We call the function L the Lagrangian of our problem. 

Now perturb by a variation e and expand the integrand to first order in e 
using the chain rule: 



L 



+ e{t)^(t,<jy{t),m)+0{e^) 



dq 

Integrating we get 

' dt 



F{cj> + e)= [ L(t,^ 

J a 



dL 
dr 



where we have integrated by parts and picked up a boundary term. The bound- 
ary term vanishes because e(a) = e(6) = 0. From this we see that the directional 
derivative of F in the e-direction at (f) is 

^ {t,mkt)) 'Jt[^ (t,0(t),^(i)))) e{t)dt. 
If 4> is an critical point then all directional derivatives vanish, which means that 



the above integral vanishes for all e. By Theorem 31 this means that 



|(t,0W,0(t))-|(g(t,0(i),0(t)))=O 

which is now a second-order differential equation for 0. Note that we have 
written ^ and not ^ because the object being differentiated is actually just a 
function of one variable, t. 

Because q, r) is often written L(i, 0, 0) the q- and r-derivatives are often 
written 

dL , dL 

—— and — r 



which looks extremely confusing first time you see it, because how can cf) and (j) 
be independent variables? You should just think of it as convenient shorthand 
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for the equation I've just written, and henceforth we'll adopt this notation too. 
The Euler-Lagrange equation is then 



dL 



d dL 
dtdd 



(2.1) 



and that's what you'll always see written. 



2.2.4 Beltrami identity 



A second-order differential equation is usually not as easy as a first-order dif- 
ferential equation. If the Lagrangian L{p, q, r) is independent of p then it turns 
out there is a first-order equation we can use instead of the Euler-Lagrange 
equation. 

Theorem 33. Suppose L{p,q,r) has no p-dependence (i.e. dL/dp = 0). If (p 
satisfies the associated Euler-Lagrange equation then 

L (t, m , ^{t)) - 4>{t) ^ (t, 4>{t) , ^{t)) = c 

for some constant C. This is called the Beltrami identity. 



Of course, this is usually written 

L-^^ = C. 
d(t> 



Proof. Throughout the proof, we write 

L, dL/dp, etc. 

instead of 

L{t,4>{t),^{t)) , 1^ (t,(/.(t),<^(i)) , 



Using the chain rule, we get 



d / ^ ; , .dL\ dL \,sdL ■■ , .dL ■:, .dL ■ , . d { dL 
dt - ^^'^^r) = ^p + ^^'^^q + ^^'^^r - * - ^^'^dt y-d-r 

By assumption dL/dp = so the first term on the right-hand side vanishes. 
The two terms with ip cancel and we are left with 

d ( . ,dL\ ,dL , d f dL 
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The right-hand side is now clearly recognisable as (f>(t) times the Euler-Lagrange 
operator. Since we're assuming that 4> satisfies the Euler-Lagrange equation, we 
see that | (i - = 0, so 

L(t,<l>{t)Ait))-^{t)^ (t,ct>{t),m)=C 
for some constant C, as required. □ 

I When you study classical mechanics in the Lagrangian/Hamiltonian setting 
(Analytical Dynamics, next term) you will be able to understand this as a 
statement of conservation of energy for an autonomous Hamiltonian system. 

Remark 34. It seems strange that we can replace the second-order Euler- 
Lagrange equation with its two boundary conditions (f){a) = A, (p{b) = B by 
a first-order equation with two boundary conditions (usually we only have the 
freedom to fix one boundary condition for a first-order equation). The point is 
that the Beltrami has an undetermined constant, C, which we can fix using the 
second boundary condition. 



2.2.5 Vector-valued functions 

We often come across problems where we have to optimise over a space of vector- 
valued functions. For instance, showing that a straight line is shortest involved 
looking at two-component functions (71(f), 72 (i)). In this case there is an Euler- 
Lagrange equation for each component. Suppose that the vector-valued function 
is {h{t), ■■■,fn{t)) and that 

Lit,fi{t),...,fn{t),fiit),...Jn{t)) 

is the Lagrangian. Then the Euler-Lagrange equations are 

dL__ f9L\ 

dl^_ d^ f 9L\ 

This is easy to see by taking variations individually. For example, the variation 
(/i + ei, /2, • • • , fn) gives rise to the first equation. In fact you could think of 
this as like the "partial derivative" of the functional in the /i-direction (very 

loosely speaking). 

Problem 35. If dL/dt = then a Beltrami identity holds: 
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When we do these basic variational calculations all we are doing is finding the 
critical points of a particular fuiic;tional. It is possible to do a second-derivative 
test to figure out if these are local maxima or minima, however most of the 
time we want to know when something is a global maximum or minimum, i.e. 
it's the biggest/smallest of all possibilities. The methods we are developing are 
purely local and won't tell us such global information, in the same way that 
ordinary calculus won't tell you global information for an ordinary function. 
The theorems we prove therefore have the form: "Assume a global minimum 
X of the functional F: X R were to exist. Then x would be a straight 
line / circle / catenary. . ." 



Problem 36. To illustrate the difficulty (in the finite- dimensional setting), 
sketch for me the graph of a function f : R which admits a local minimum 

at the origin but has no global minimum. 



There is a set of harder analytical techniques which belong to the calcu- 
lus of variations which allow one to prove the existence of global maximis- 
ers/minimisers. The idea is roughly the following. Suppose the functional 
F: X ^ H you are interested in is bounded (say from below). Take a se- 
quence Xk & X such that F{xk) tends to the infimum mix^x F{x). Try to 
find a convergent subsequence. This subsequence may not have a limit in 
the space X - if the space X consists of C^-functions then maybe there is 
some loss of differentiability in the limit. Nonetheless, the subsequence has a 
limit in some slightly larger space X (using something like the Arzela-Ascoli 
theorem) and is a global minimum there. This implies that it satisfies a 
weak form of the Euler-Lagrangc equation (weak in the sense that it may 
not be differentiablc and hence the Euler-Lagrange equation doesn't even 
make sense for it). Now apply some kind of regularity theory to prove that 
the limit is actually as smooth as you want it to be, which implies that 
the global minimum exists somewhere in X. You will hopefully meet suc;h 
compactness and regularity theorems in future functional analysis/calculus 
of variations/geometric analysis courses. 

Of course in the examples we care about in this course, this technology proves 
the existence results we want. In the words of Hilbert: 

Every problem in the Calc;ulus of Variations has a solution, 
provided the word 'solution' is suitably understood. 
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2.3 Examples 
2.3.1 The catenoid 

Let 0: [a, 6] — * R be a function with (j){a) = A, (j){b) = B and > for all 
X £ [—1,1]. Consider the graph of this function inside 



and consider this R^ as the (x, z)-plane in R'^. Rotate the graph around the x- 
axis and consider the surface it traces out. This is called a surface of revolution. 



Figure 2.1: A surface of revolution, obtained by revolving a graph around an 
axis. 

Lemma 37. The area of this surface of revolution can be written as an integral 



Proof. See |http : //youtu . be/p_xiCrZz_IU In order to define the area of 
the surface of revolution, we approximate it as follows. Take a decomposition 
of the interval [a, 6] into n pieces of width e = (b — a) jn. Let = fee, 
fc ~ 0, . . . , n. We replace the surface by a union of conical frustra Fk, k — 
0, . . . , n — 1, namely over the interval [tk,tk+i] we define Fk to be the surface 
of revolution obtained by revolving a segment of tangent line to the graph 
of (f) at {tk,(j){tk)) around the a;-axis. This frustum looks like a piece of a 
conical surface connecting a circle Ct^ and a circle of radius Qk and 
respectively. 

The surface area of this frustum is given byFI 



{{x,z) e R^ : z = (t>{x)} 




xz-plmie 




area(Ffc) 



T^dkiQk + Rk) 



where dk is the distance along the surface of the frustum between the two 
end circles. It is clear from the picture that Qk = (j){tk) and, since the radius 
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Rk is obtained by moving from (tk, 4'i'tk)) along a line of slope ^(ife) for time 
e, we know that Rk — Qk + 4>{'tk)t- We can compute dk from Pythagoras's 
theorem 

dk = v/e2 + e2^(tfe)2 = e^\ + ^{tkY 
so the area of the frustum Fk is 

The area of the whole surface is defined as the limit 

n-l 

lim area(Ffc) 

n — ►oo ^- — ^ 

fe=0 

which is 

n— 1 . 

fc=0 

Let us expand this: 

n— 1 . n— 1 y 

lim V27reA/l + 0(ifc)''/'(ife)+ 1™ V TreA/l + 0(ifc)2e0(t;,). 

We would like to ignore the second term. Since (p is continuously diff'eren- 
tiable, (j) is a, continuous function on the closed interval [a, b] and hence it is 

bounded. That means that ttW 1 + 4>{'tk)'^4>{tk) < C for some constant C. So 
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expression inside the limit in the second term is bounded above by 

n— 1 n— 1 

^ Ce^ = ^ C{b - afjj? = C(h - af/n 

fc=0 fc=0 

which goes to zero in the hmit as n — s- oo. Hence we can ignore this term. 

The first term is actually just what we would get if we wrote out the definition 
of the Riemann integral of the function 



using the sequence of decompositions we began with. This proves the lemma. 

□ 

Theorem 38. Assume that there is a C'^ -function (j>: [a, 6] — s- R which gives a 
surface of revolution of minimal area subject to the boundary conditions (/'(a) = 
A, 4>{b) = B . Then (j) has the form 

0(x) = Ccosh((x - i:')/C) 

for some constants C, D to be determined by the boundary conditions. 

This curve is called a catenary curve and that's why its surface of revolution is 
called a catenoid. We'll meet the catenary curve again in Section [2.4.31 below. 

Proof. We are seeking to minimise the functional 

= j'^<j>{t)^i + 4>{tYdt 

i.e. 

L{p,q,r) = q\/l + r'^ 
Clearly, L is independent of p, so we have Beltrami's identity 

L-6^^C 



for some constant C. Since ^ = Sip/ \ I + 6^, this becomes 



1 + 



Multiplying by ^1 + 0^: 
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or 

This rearranges to give 

Now 

t + D' = I dt = 



= C\ 1 + 6^. 



(J2 




Clog 2 



C 



= Clog2 + Ccosh"^ 



Therefore, setting D = dog 2 — D' , we have 

't-D 



C 



(j){t) = Ccosh 



C 

as required. □ 

Problem 39. Suppose that a ~ —1, 6 = 1. Show that the equal height case 
A — B leads us to <j){t) — Ccosh(t/C). For what values of A does this have 
a solution? Given the interpretation of a minimal surface of revolution as a 
soap-film, how do you think we should interpret this last answer? 



2.3.2 Area of a frustum 




I See http://youtu.be/OEeJeBElvgOl A conical frustum is the surface of a 
truncated cone. Let Q and R denote the radii of the circles at the top and 
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bottom of a frustum (assume Q < R). In the proof of Lemma 37 we used 
the following formula for the surface area of a frustum: 

Trd{Q + R) 

where d is the distance between the top and bottom circles as measured on 
the surface of the frustum. We will demonstrate this formula here, though 
the demonstration begins with an unjustified piece of geometric intuition 
that I won't bother to explain. 

The point I will not justify is that you can cut the frustum and lay it out 
flat. The result is a planar region: a segment of an annulus. You could 
demonstrate this by writing down an explicit isometric (distance-preserving) 
map between the cut frustum and the planar region, but I won't do that. 
Given this fact, we will compute the area of the planar region. 



2nR 




Let F be the fraction of the annulus which is subtended by the cut frustum 
and let P denote the radius of the inner boundary of the annulus. We know 
that the distance between the two boundary components of the annulus is d. 
We also know that the length of the inner arc is 2ttQ and the length of the 
outer arc is 2TrR. In other words, 

2ttPF = 2ttQ 
2'K{P + d)F = 2ttR. 



This gives 



and 



F = 



P = 



(R-Q) 
d 

Qd 



R-Q 

Therefore the area of the annulus, being the difference of the area of two 
discs, is 

TT (P + df - 7rp2 ^ T:d{d + 2P) 
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and the area of the segment is a fraction F of this, giving the area of the 
frustum as 

TT{R-Q){d + 2P) = TT{R-Q)(d+2 



R-Q 
TTd{R -Q + 2Q) 
nd{Q + R) 



as claimed. 



2.3.3 The brachistochrone 




Figure 2.3: The brachistochrone curve is the optimal shape of wire for a fric- 
tionless bead moving under gravity to slide down between two points in order 
to minimise the time taken. 

From the Greek Ppaxi'<^Toa + xpovo<^ meaning "shortest" + "time" , the brachis- 
tochrone problem is the problem of finding the shape of a wire joining two points 
p and q such that a bead running on this wire under gravity (no friction) gets 
from p to q in the shortest time (assuming it starts off at rest at p). 

Let's translate p to the origin and suppose that q is at (xq, uq), with a;o > (so 
that p and q are not directly above and below one another) and with yo < (so 
that the bead will really fall from p to q). The kinetic energy of a bead with 
speed V is imv^ and the gravitational potential energy of a bead at (x, y) is 



2 

mgy. Therefore 

imw^ + mgy 



is constant throughout the trajectory. At p the bead is at rest and y = so this 
whole expression vanishes identically. This means that we can solve for v: 

V = \/-2gy 



We assume that the wire is the graph of some function y{x). Moreover, we'll 
assume that this function is invcrtiblc, that is we can express a; as a function of 
y and dx/dy — (dy/dx)^^. 
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The time taken for the bead to move along the wire to q is 



ds 
s 



where s{t) is the distance travelled, so s = v. Therefore 

r'i ds 



T 



pds 

Jp V 

r 

Jp V-^gy 



1-1 sjdx^ + dy^ 



-L 



dx 



and we have a calculus of variations problem on our hands: minimise T as a 
functional of the curve y{x) (under the assumption that y{0) = and y{xQ) = 
yo). The Lagrangian for our problem is L{p,q,r) = 

The Lagrangian is independent of its first variable and so we have Beltrami's 
identity 

^ dy' \ sRl^ J 

for some constant C. This gives 



or 



{l + {y'y-{y'y) = C 



Writing this in terms of x{y), we get 

dx 



VA + y' 



where A = l/2C^g. Wc can now integrate this by substituting y = — j4sin^(^) 
and using the condition x(0) = to get 
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Remark 40. From Figure 2.3 it should he clear that our assumption (that x 
is expressible as a function of y) is a poor one. Of course, the end result is 
reassuringly multivalued (it's got a sin^^ and a \f-). So although we made an 
illegal assumption, the answer is actually correct. This is the beauty of the Euler- 
Lagrange equations: their locality - from a global assumption about a function 
(that it minimises a functional) you derive an equation for the local behaviour 
of that extremal function. In the same way it's kind of magical that light travels 
in such a way as to minimise the time taken during travel, by only obeying local 
laws of physics (without knowing beforehand what its endpoint is). If we were 
feeling careful we could make this brachistochrone argument more rigorously, but 
for the purposes of this course we'll just be satisfied that we got the right answer. 



2.4 Constrained problems 

We have already considered constrained optimisation in the finite-dimensional 
setting. Exactly the same idea works in infinite-dimensional functional opti- 
misation problems, but it is easier not have to think in terms of tangencies of 
infinite-dimensional hypersurfaces! I will explain how the algorithm works and 
give some examples. 



2.4.1 The algorithm 

The algorithm is the same as usual. You have a functional defined on some 
space of functions This functional is given by an integral 

F{4.) = ( L{t,ct>,4)dt 

J a 

and you want to find its extrema. However, you also want to impose a constraint 
on the functions (j). Maybe you want to fix the total arc-length of the graph of 

[ \l 1 + 02rft 
J a 

or the area underneath the graph 




Notice that these constraints are given by integrals; let's consider the general 
constraint: 




M{t, 0, ^)dt = C. 
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We impose this by introducing a Lagrange multiplier A G R. The full functional 
we want to consider is 

Varying with respect to A gives 

OF / \ 

— = J M[t,c^,<pjdt-C = 0. 

Varying with respect to cj) gives the usual Euler-Lagrange equation but where L 
is replaced by L + A(M - C): 

liL+KM-C))=U'iL + XiM-C))). 



2.4.2 The isoperimetric problem 

For an interesting historical survey of this problem and approaches to proving 
it, see this article by V. Blasjo in the American Mathematical Monthly 

The isoperimetric problem is to find a curve in the plane with a fixed length K 
which bounds an area A which is maximal amongst all areas bounded by plane 
curves of length K. The answer is, of course, a circle of radius K/2tt. We will 
prove the following: 

Theorem 41. // there exists a 2'K-periodic (i.e. closed) -curve 7: R ^ R^ 
of length K which maximises the area it hounds amongst all closed -curves of 
length K , then it is a circle. 

Note that, as usual, we do not prove existence of a maximiser! We will simply 
show that the only critical point of a suitable functional on the space of curves 
is the circle. In the problem sheets we will use Fourier analysis to prove the 
stronger statement that the circle is a maximiser. 

Proof of Theorem If C is a curve parametrised by a 27r-periodic C^-function 
7 : R ^ R^ and B is the bounded component of R^\C then we seek to maximise 
the area integral /g dxdy subject to the condition that ^ + 12^^ = K. 

Of course, this integral doesn't look quite right yet: we're used to integrating 
along a curve, not over an area. By Green's theorem: 

dxdy — I d{xdy) = / xdy — / 'yi{t)j2{t)dt 
B Jb Jc Jo 



so our functional becomes 



dt 



2.4. CONSTRAINED PROBLEMS 



55 



where A is a Lagrange multiplier. Let's write x and y rather than 71 and 72. 
The Lagrangian is 



L{x, y, x,y) = xy~ X (^^/ x'^ + y'^ 

and the Euler-Lagrange equations are 

d I Xx \ 



-) 

2tt J 



d I Xy 



dt y ^ip- + y2 j 

n2TT 

^xP- + y^dt 

Jo 

(differentiating ^"(7) in the 71, 72 and A-directions respectively). 

Now we do something cunning, which we also did when we dealt with straight 
lines. If we are given a path [a, b] — > R^, we can reparametrise that path to get 
a new path [a, b] R^. This way we can make a particle travel along it at any 
speed v(t) we want, provided that 

b 

v{t)dt = K 

and we certainly don't change the length of the path (the distance the particle 
travels). For the rigorous justification of this, see Section 2.1.3 In particular, we 
can parametrise proportionally to arc-length and define our new time coordinate 
s{t) = ^ /q ^7? + 7|dt. This satisfies 

il V7i(-)^+72(a)2da=^ 

in other words 



or 



ds J \ds J 2tt 

K 
" 2^' 

The Euler-Lagrange equations simplify to 



27rA. 
27rA 



(note that the other equation is automatically satisfied). We have 

d^y dy 
flgS Att^X"^ ds 
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so y is a harmonic oscillator. This implies that the solution is in fact a circle 
(could be centred anywhere and parametrised starting from any angle). □ 

It seems slightly devious that, halfway through a Lagrange- multiplier proof, we 
reparamctriscd so wc could ignore the Lagrangc-multiplicr equation. The ques- 
tion I asked myself when I started off writing down this proof was: Why don't 
we restrict attention to arc-length paxametrisations from the very beginning? If 
you try it, you'll get the wrong answer. The reason is that the space of paths 
parametrised by arc-length is not a nice flat space. If you add an arbitrary per- 
turbation j+e, you completely ruin the condition that the curve is parametrised 
by arc-length. For the same reason, we have to impose the length-fixing con- 
straint with a Lagrange multiplier instead of by starting off with the space of 
all curves of a fixed length. 

So why were we allowed to reparametrise then? Because we knew that, if a 
solution of the constrained problem existed, we could reparametrise it afterwards 
and what we obtained would still be an extremum of the constrained problem. 
So we might as well restrict ourselves to curves parametrised by arc-length, but 
we really need to be using the constrained Euler-Lagrange equation. 



2.4.3 The catenciry 

Coming from the Latin word "catena" meaning chain, the catenary is the curve 
which follows a hanging chain. More formally, if (a, A) and {h, B) are points 
in a vertical plane and there is an infinitely thin chain of uniform density p 
(mass per unit length) whose endpoints are fixed at these points, hanging freely 
under gravity and having fixed length K then the chain fits a curve called the 
catenary, given by the graph of a function ^(f), t G [a,b] which is the solution 
to a variational problem. 

The variational problem is this: minimise the total potential energy of the chain 
subject to the condition that its length equals K. The potential energy of a mass 
at height y is mgy so each "infinitesimal element" of our chain {x{t),y{t)) = 
{tjm has: 

• length equal to ^ dx"^ + dy"^ = \J'^ + pdt, 

• mass equal to p\J 1 -|- pdt, 



• potential energy equal to pgf^j 1 -|- pdt. 
Integrating this over the whole interval [a, b] gives the total potential energy 



pg I m^i + f{tydt 

J a 
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and the total length is 

K = [ ^1 + m^t. 

J a 

The functional for the constrained variational problem is therefore 



pgj f(t)^l + f{trdt + x\^j ^1 + fitrdt - Kj 
where A is a Lagrange multiplier. 

Problem 42. Find the constraineA Euler-Lagrange equations for this problem 
and show that the solution is a catenary curve 



2.5 Several dimensions 

The problems we have considered so far have all concerned optimising functions 
of one variable. The resulting Euler-Lagrange equation is a second-order ordi- 
nary differential equation. Things become even more interesting if we consider 
variational problems with functions of several variables. The theory is almost 
identical except that ordinary derivatives become partial derivatives and the 
equations become harder! 



2.5.1 Euler-Lagrange equation in two variables 

The generalisation to many variables will hopefully be clear. We consider a 
functional which takes functions (^{x, y) of two variables and outputs an integral 

j j L{x,y,(j),dx(l)dy(j))dxdy 

where L{p, q,r, s,t) is a function of five variables. The usual Euler-Lagrange 
argument applies but we need to use the several-variable version of the chain 
rule. This results in the equation 

dL_d^/dL^\ d^/dL\ 
dq dx \ ds ) dy \dt J 

where all the derivatives of L are evaluated at 

{p,q,r,s,t) = {x,y,(l){x,y),{da;(f>){x,y),{dy(l>){x,y)). 

In order to reflect this, the equation is usually written 

dL _ d / dL\ d / dL\ 
d<j) dx \d(j)x) dy \d(j)y) 

where we are using the notation (j)x = d(l)/dx, (t)y = d(j)/dy simply to minimise 
the number of ds floating around. 
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2.5.2 Laplace's equation 

Consider a plane region B bounded by a closed curve C and let / : C — > R be a 

function. Wc seek a C^-function cp: B ^ H satisfying (p\c = f and minimising 
the following integral over all such functions: 

F{^)= [ {{d^ifif + {dyipf) dxdy = [ IVipfdxdy. 
Jb Jb 



Since 



the Euler-Lagrange equation is 



dip 



dx \ dx J dy \ dy 
where A denotes the Laplace operator 

dx"^ dy^ 

The extremals therefore satisfy A(p = (Laplace's equation) and are har- 
monic functions. We will meet Laplace's equation again later as the steady- 
temperature case of the heat equation: temperature distributions evolve over 
time via the heat equation 

and, after a long time when the temperature reaches a steady state so that 
dip/dt = 0, we achieve a steady-temperature distribution satisfying Laplace's 
equation. You can see from our variational characterisation that the heat equa- 
tion is acting to minimise the total gradient of the temperature distribution (as 
defined by the functional F{ip)). This fits well with our physical intuition. 



2.5.3 Minimal surface equation 

Let C C be a closed curve in the plane bounding a region B and let / : C ^ 
R be a given function. We would like to find an extension ip of / to the whole 
of B which minimises the surface area of GTaph{p). This is precisely what a 
soap-film would do if you dipped the graph of / (considered as a wire frame) 
into a suitably soapy liquid mixture. 

Lemma 43. The surface area of Graph((p) is 




2.5. SEVERAL DIMENSIONS 
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Proof. I'll only give a sketch proof. Take a tangent plane to the graph of ip at 
a point on the graph. Two of the vectors in this tangent plane are (1,0,9a;/) 
and {0,1, dyf). The area of the parallelogram which they span is equal to the 
magnitude of their cross-product 

(1,0, a,/) X (0,1, a^/) = {-dj,-dyf,i) 

and this magnitude is precisely the integrand. Now approximate the surface by 

such parallelograms (living over squares of edge- length e in R^) and take better 
and better approximations (e ^ 0). This yields the integral. □ 

Theorem 44. If ip is a C^-function with ip\c = f which minimises the surface 
area of its graph amongst all such functions then (f satisfies 

^ +^ (i+f = 2^^!^ 

dx^ I \dy J J dy^ I \dx J J dx dy dxdy ' 

The proof is cxmsigncd to one of the problem sets: I could not deprive you of 
the pleasure of this calculation. 



CHAPTER 2. CALCULUS OF VARIATIONS 



Part II 



Partial differential 
equations 
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Chapter 3 

Some general theory 



3.1 The definition 



A partial differential equation (PDE) for a function . . . , Xn) of n variables 
is a relation between the values of 4> and (finitely many of) its partial derivatives 
(to arbitrary orders) at each point. That is, the relation never relates the partial 
derivatives or values of at different points - that would be a difference equation, 
and would probably be harder to solve. We will write F{<p){x,y) = for this 
relation at each point {x,y). We should probably write something like 

F{x,y, (f>{x, y) , {d^4>) {x,y), (dycf)) {x,y), (a^(/)) {x,y),...) =0 

where the dots indicate a finite list, but that would be cumbersome. 
For example, 

d(j) 

F{<i>) = ^ y) + <l>{x,y) — {x,y) - xy = Q 



or 

' d(t>\'^ ( d(f>\ 



dx J ' \dy ) 
or 

d(f) _ ay 

dt dx^ 

are PDEs, where we have already started to omit the argument from each term. 
On the other hand, 

dx<i>{x, y) = dy<j){x + 1, y - 1) 

and 

da,(t> + dl^+dl(t> + --- = 
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are not PDEs (one is a difference equation, one involves infinitely many deriva- 
tives) . 

The highest order of derivatives which appears in the equation is called the order 
of the equation. We will first consider first-order equations, then second-order 
equations. But here are some more definitions: 

Definition 45. • A PDE F{(f)) =Q is called linear if 

F{Xi^i + X2h) = MF{(f>i) + \2F{<t>2) 

for all Ai, A2 G R. In practical terms, this means that F{(p) is a sum of 
terms of the form 

Adi ■ ■ ■ dk4> 

where A depends only on the variables xi,...,Xn and di...dk is some 
string of partial derivatives. We also allow inhomogeneous linear equa- 
tions, F{(p) = R where F is linear and R is a function of xi, . . . ,Xn- 

• // moreover, each coefficient A is just a constant, we say the equation has 
constant coefficients. 

• A PDE is called quasilinear if in each term involving highest (say kth) 
order partial derivatives d\ - ■ ■dk4>, the coefficient doesn't involve any other 
kth derivatives. So 

d(f) d(f> d4> ^ 
dx dy dy 

is not quasilinear because it is first order but there is a term involving a 
product of first derivatives. The Burgers inviscid fluid equation 



dx~^ dt 



is quasilinear. 



// a PDE is not quasilinear then it is called fully nonlinear. For example 
the eikonal equation 

2 / o / \ 2 



^dx J ' \dy ) 
important in geometric optics or the Monge- Ampere equation 

dx"^ dy^ \dxdy 
important in geometry. 

Most of the PDEs wc arc going to talk about in this course in are linear with the 
exception of a few quasilinear and fully nonlinear first order equations (like the 
eikonal equation). We will start by talking about first-order equations because 
there is a very well-developed theory for them. Then we will talk about some 
specific second-order equations. I will take a moment to discuss the second-order 
equations we're going to study. 
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3.2 Quasilinear second order equations in two 
variables 

The general quasilinear second order equation in two variables {x, y) is 

Ad^^B^+C^^ = R 
dx^ dxdy dy'^ 

where A, B, C, R are arbitrary functions of x, y, (j), dx4> and dycj). 

Definition 46. Define A = — 4:AC. This is a function of x, y, (j), dx(p, dy(j). 
However, sometimes it happens that this function is always positive, always 
negative or always zero. For instance, if the equation is linear and has constant 
coefficients then A is just a number. 

• // A < we say that the equation is elliptic, 

• if A = we say that the equation is parabolic, 

• and if A> we say that the equation is hyperbolic. 

This is only a classification for linear, constant coefficient equations for which 

A > / = / < makes sense. More generally, it is a guideline - for example, we 
expect solutions to quasilinear elliptic equations to share certain properties with 
solutions to linear elliptic equations. 

Here are some examples of linear equations with constant coefficients. 
Example 47. Laplace's equation 

— - H = 

dx'^ dy'^ 

has A = C = 1, B = Q so A = —4 < which means it is an elliptic equation. 
Example 48. The heat equation 

d(j) _ d^(j) 
dt dx'^ 

has A = 1, S = 0, C = 0. Therefore A = and the equation is parabolic. 
Example 49. The wave equation 

1 d'^cf) _ d^(j) 
(? dt"^ dx'^ 

has A= 1/(?,B = Q,C = —1 so A = >0 and the equation is hyperbolic. 

This classification may seem like something quite formal, but the three types 
of equations really do display very different kinds of behaviour (and quasilinear 
equations of elliptic type really do behave very like linear elliptic equations). 
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Problem 50. Show that the (quasilinear) minimal surface equation 

dx^ I \dy J j dy'^ I \dx J j dx dy dxdy 
from Section \2.5.3\ is elliptic. 



3.3 Some basic tricks 



Let's restrict our attention to linear, homogeneous equations in two variables 
with constant coefRcients. We take first- and second-order equations and look 
at some basic tricks for solving them. These tricks amount to making clever 
changes of coordinates and will generalise to the method of characteristics for 
first-order equations. They will also make it clear why A is a natural quantity 
to consider for second order equations. 



3.3.1 First order 



Suppose we have a first-order equation we want to solve: 

ax ay 

The expression + is just the partial derivative of (j) in the {A,B)- 
direction. Consider the family of all lines parallel to the vector {A,B), that is 
the lines 

{{xQ + tA,yQ + tB) : ten} 
Suppose we have a solution (p and we restrict it to one of these lines: 

u{t) = (t)ixo + tA,yo + tB) 

Then we know 

du ^dcj) ^ dcj) 
dt dx dy 

so 

flu 

-^t) + Cu{t)^0. 
This is an ordinary differential equation equivalent to 

and has solution 

u{t) = Ke-^* 



3.3. SOME BASIC TRICKS 
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Suppose that Ay^O. Let xq = 0. Then, as t and yo vary, the point {tA, yo + tB) 
traces out the whole of R^. Indeed, the point {x,y) corresponds to t = x/A, 
yo = y — xB/A. Therefore 

(l){x,y) = K{y - xBIA)e-^''l^. 

Note that we are ahowing the 'constant of integration' /sT to be an arbitrary 
function of y — xB/A = yo because yo was fixed when we restricted to the 
line {At, yo + tB). This makes sense because y — xB/A is annihilated by the 
directional derivative 

A—+B — 
dx dy 

This basic trick is the idea behind the method of characteristics: using the 
original equation, find a system of curves (in this case straight lines) along 
which the PDE reduces to an ODE which we can solve. 



3.3.2 Second order 

Consider the equation 

Af|+B|^ + Cf|=0 (3.1) 
dx'^ oxdy dy' 

and suppose that A = B^ — AAC > 0, i.e. the equation is hyperbolic. Then the 
quadratic polynomial 

At^ + Bt+C = (3.2) 

has two distinct real roots 

-B±^/A 
2A 

so that At'^ + Bt + C = A{t -t-){t- t+). Define s± = y + xt± so that 

s+ — S- t-^.s- — t-s^ 



X = , y = 



Then 



d 


dx d 


dy 


d 


ds+ 


ds+ dx 


ds+ 


dy 
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{ 
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This means that 



d d 1 ^ ^ _t ^ 



ds^ 9s_ — t_)^ \dx dy J \dx dy 

1 / , 92 Q2 Q2 



A{t^ — t_Y \ dx^ dxdy dy 



and hence, if (j) solves (3.11, 

^ = 0. 

The general solution to this equation is a sum of two completely arbitrary dif- 
ferentiable functions: 

/(,s+)+.g(s_), 

i.e. 

f{y + xt+) +g{y + xt^). 

If the original equation were instead parabolic then the quadratic polynomial 
( |3.2| would have two identical real roots to and the general solution would be 
f{y+xtQ) + xg{y+xto). To see this, use the new coordinates {u,v) — (jj + tox,x) 
instead of s±. 



If the original equation were elliptic then ( 3.2 ) would have two non-real complex- 
conjugate roots z and z* . The general solution is once again 

f{y + xz) + g{y + xz*) 

but the functions / and g (which may be complex) have to be chosen so as to 
malce their sum real. For instance, 

fit) = g{t) = t 

would give 

y + xz + y + xz* — 2y + x Re(z) 

as the solution. 



Examples 

Example 51. Suppose we wish to find the general solution to 

zl _| L I 2 — = e . 

dx^ dxdy dy"^ 

The quadratic equation is t'^+t—2 = (<— l)(f+2) and therefore t_ — —2, — 1. 
Then s_ = y — 2a; and s_|_ = i/ + .t and the equation becomes 

9s+9s_ dx'^ dxdy dy'^ 
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Note that x = (s+ — S-)/3 so our equation is 



dsj^ds- 9 
Integrating with respect to s+ and s_ gives 

(P = e'+-'- +F{s+) + G{s_) 

or 

<t>{x, y) = + F{y + x) + G{y - 2x). 

Now let us see how to solve for F and G when given initial conditions (f){x, 0) 
M{x) and d(l)/dt{x,0) = N{x). 

Example 52. Say we require (l>{x,0) = x^ and d(f)/dt{x,Q) = x^. Then 

a;2 = + F{x) + G{-2x) 
x^ = F'{x) + G'{-2x) 

Differentiating the first of these gives 

2a; = + F'{x) - 2G'{-2x) 

and now we have two simultaneous equations for F' and G' . These give 

3F\x) + = 2{x^ + x) 

or 

™4 ,yi2 

Plugging back into the very first equation gives 

G{-2x) = x^ -e" - F{x) 

or 

2^2 4 

G{-2x) = ^ - ^ - ^. 
^^363 

Substituting z = —2x gives 
Hence 

{x + vY {x + yf e'^+y 



(l){x, y)=e' + 
+ 



6 3 3 

{y-2xf {y-2xY 2e-fa-^^)/^ 
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Chapter 4 

First order PDE 



As usual, I will only deal with two-variable equations, mainly for notational 
convenience. 



4.1 Linear case 



We have already seen how to solve first order linear homogeneous PDE with 
constant coefficients in Section [3.3.1[ Let's just quickly extend this to the inho- 
mogeneous case. The equation we want to solve is 

The trick we employed before works again: we restrict attention to the line 
{At, uo + Bt) along which the equation for u{t) = 4'{At, yo + Bt) reduces to an 
ODE 

dii 

-- + Cu^D{At,yo + Bt) 
at 

or 

'^-Xe^*u) = e^'D{At,yo + Bt) 



dt 

and the solution consists of a particular integral 

u{t) = e"*^* / e^W{As, yo + Bs)ds 
Jo 

plus a complementary function 

71 
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with a 'constant' depending on j/o, and we can substitute t = x/A, yo = y—Bx/A 
as before to solve the original problem: 

px/A 

^{x, y) = e"^^/^ / e'^'D{As, y - Bx/A + Bs)ds + K{y - Bx/A)e-^^/'^. 
Jo 



4.2 Linear case, varying coefficients 

Now we allow A, B, C to depend on {x, y) too (but not yet on (j)). The equation 
we want to solve is 

A{x, y) 1^ + B{x, y) + C{x, y)cj>= D{x, y) . 

The operator A{x,y)'^ + B{x,y)^ has an obvious interpretation as a direc- 
tional derivative, but for the varying vector field v{x,y) = {A{x,y), B{x,y)). 
We need a replacement for the family of straight lines {u, yo) in the previous 
case. 

Definition 53. An integral curve for the vector field v{x, y) is a curve 7: R — > 
such that 

j{t) = v{j{t)), 

in other words, a curve whose tangent vector is everywhere given by v. 

Example 54. Suppose that v{x,y) = {A,B), where A and B are constant. 
Then the lines (.TojI/o) + ti^^B) are integral curves for v. 

Example 55. Suppose v{x,y) — {—y,x). Then the family of circles of radius 
r, j{t) = (rcos(<),rsin(t)), are integral curves for v. Indeed, 

7(t) = (— r sin(t), r cos(t)) = v{-y{t)). 

We'll sort out the problem of actually finding the integral curves momentarily. 
Let's assume for the moment that we have found them. If 7(f) is a paxametri- 
sation of an integral curve then 

^0(7(t)) = d^(^t)(t>iiit)) = d^^^t)<t>{v) 

which is just the directional derivative of ^ in the u-direction at 7(f), in other 
words 

Therefore, setting f{t) = ^{'y{t)), we have 

|(f) = D(7W)-C(7(t))/W 
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which is an ordinary differential equation for /. 



So the only difficulty is in working out 7. We have 



dt 
drf2 
dt 



^(7iW,72W) 



B(7iW,72(i)) 



which is a system of two ODEs for two unknowns 71,72- 

Therefore in order to solve the PDE, we have to solve a pair of coupled ODEs 

for 7 and then a further ODE to find / along 7. This latter ODE introduces 
a 'constant of integration', but this 'constant' is allowed to vary as you move 
from one integral curve 7 to another. 

Definition 56 (Characteristics). The curves j{t) are called the characteristics 

of our equation. Note that when we solve the system of ODEs for 7 we get two 
constants of integration and hence a two-parameter family of curves. One of 
these parameters can always be fixed because it will correspond to reparametrising 
the characteristics. 

Let's do an example. 
Example 57. Consider 



which has solution -)i{t) = t + M and 72(<) = {t + M)"^ + N; in other words, 
the integral curves are parabolae. Let us fix M = Q by translating the time 
parameter. 

Now for each N define f{t) = (j){t, t^ + N) and we need to solve df /dt = 1. This 
gives f{t) = t + K{N) or, given that x = t and y — x^ = N, 




We have A{x,y) = 1, B{x,y) = 2x and hence 



drn 
dt 

dri2 
dt 



271 



1 



4>(,x,y) = X + K{y - x^) 



which is the general solution of our problem. 



Example 58. Consider 
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We have A(x,y) = y, B(x,y) = x and hence 

72 



dt 
dt 



= 71 

-(71 - 72) 
71 +72 



or 

d{li ~ 72) 
dt 

c?(7i + 72) 
dt 

which has solution 71 — 72 = Afe~*, 71 + 72 = Ne* or 

lit) = Q (iVe* + Me-*) , i (TVe* - Me^*)^ 

which are the hyperbolae x^ ~ y^ — MN . Let us fix Ad = 1 and solve for 
fit) = (i (TVe* + Afe-*) , i (A^e* - Me'*)). 



df_ 
dt 



'xyf 

-\{Ne' + e-') (iVe* - e-*) /(t) 



which has solution 



/(t) = if(7V)exp(^-i(iW* + e-2*)) 
or, using the fact that iVe* = a; + e^* — x — y, 

(l){x, y) = K{x^ ~ y^) exp (^-^ ((x + y)^ + (x - j/)^)^ 
= K{x'^ - y^) exp 



4.3 Quasilinear case 



See http : //youtu . be/43i-A61t_MI 



Now we move on to quasilinear equations 



Aix, y, </>) ^ + B{x, y»|^ + C{x, y, <P) = 0. (4.1) 



4.3. QUASILINEAR CASE 
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This is slightly harder to deal with: instead of using a system of two coupled 
ODEs to find the characteristics and then solving the final ODE to work out / 
along the characteristics, we end up with a system of three coupled ODEs for 
7ij 72, / all at the same time and we have no obvious way of separating out the 
three. 



4.3.1 Characteristic vector field 



Suppose that is a solution to the equation and that {x{t),y{t), f{t)) is the 
restriction of this solution to a curve {x{t),y{t)). The system of ODE is 



dx 

'dt 
dy 

dt 
dl 
dt 



A{x{t),y{t)J{t)) 
B{x{t),y{t)J{t)) 
-~C{x{t),y{t)J{t)). 



(4.2) 
(4.3) 
(4.4) 



You can see that we are unlikely to be able to solve for x and y first because / 
occurs in all three equations. 

Suppose this system of equations is satisfied by some family of curves (xs (t) , yg (t) , Zg (t)), 
that is: for each s the curve 

{Xs{t),ys{t),Zs{t)) 



satisfies (4.2l-(4.4). This family of curves traces out some surface in {x,y,z)- 
space. Suppose that this surface is the graph of a function z = (t>{x, y). A simple 
application of the chain rule gives: 



d(t>{xs{t),ys{t)) 
dt 



dxs dcj) 
dt dx 



dys d(t> 
dt dy 



and when we substitute in the right-hand sides of (4.2)-(4.4l we see that 
satisfies (4.1 1 identically! 

This means that if a function 4'{x, y) has the property that its graph 

{z = (j){x,y)} 
coincides with the surface traced out by 

(s,i) 1-^ {xs{t),ys{t),zs{t)) 



then is a solution to Equation (4.1 ). 



Conversely, if is a solution to (4.11 then {x{t) , y{t) , (j){x{t) , y{t))) , i.e. z{t) — 
0(x(t), y{t)), is a solution curve to the system (4.2 )-([4!4| of ordinary differential 
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equations, since 

dz dx d(j) dy dcj) 
dt dt dx dt dy 

OX oy 

= -C 

as required. 

Definition 59. The vector field (A, _B,— C) in {x,y, z)-space is called the char- 



acteristic vector field of the quasilinear equation (4.1 1. The integral curves of 
{A^ B, — C) are called characteristic curves and the projections of the character- 
istic curves to the xy -plane are called the characteristic projections. 

In this language, what we have proved is that (a) the graph of a solution to 



(4.1 1 is a union of characteristic curves, and that (b) a union of characteristic 
curves traces out a surface which, when it is a graph, is the graph of a solution 
to (|4l]. 



4.3.2 Cauchy data 

We know that the graph of our solution is a union of characteristic curves. We 
have to pick an 'initial condition' for our PDE. This will take the form of a 
curve in (x, y, z)-space. 

Cauchy Problem. Given a curve R — s- in (x^y) -space and a function 
/ : X ~^ R- defined over that curve, find a solution (f> to 

A4>^ + B(j)y + C 

such that = /. This initial condition is called the Cauchy data and the 
solution is called its development. The curve x is called the Cauchy hypersurface. 

Geometrically, solving the Cauchy problem means finding the characteristic 
curves which pass through the curve 

s ^ (Xi(s),X2(s),/(x(s))) 

in (x, y, 2;)-space. The union of these characteristic curves is called a solution 
surface and is parametrised by the coordinates (s, t) (where s is a coordinate on 
the Cauchy hypersurface and t is a coordinate along the characteristic curve). 



4.3.3 An example: Burgers's equation 



It's high time we solved an example. 



4.3. QUASILINEAR CASE 
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(b) 

Figure 4.1: (a) The solution surface to ^ + 0^ = with the Cauchy data 
0(0, y) = y, plotted with the characteristic vector field. This surface is a union of 
straight lines which are characteristic curves, (b) The characteristic projections 
of this solution. You can see that they begin to cross at (—1,0). 



Example 60. Take the equation 
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which has A{x,y,z) = 1, B{x,y,z) = z, C{x,y,z) = 0. Therefore the charac- 
teristic vector field is 

x = l 
y = z 
z = 

which has integral curves 

x = t + D 
y = Et + F 
z = E 

for constants D, E, F. Without loss of generality we can take D = because we 
can reparametrise t so that x{Q) = 0. These characteristic curves are straight 
lines, pointing horizontally but with a slope in the xy-plane which increases as 

z increases. 

Let's choose some Cauchy data. We'll take as Cauchy hyper surface the straight 
line X = {(0,y) S R^}, which is parametrised by s ^ (0, s). We'll take two 
different sets of Cauchy data: 0(0, .s) = /(s) with (a) f{s) = s, (h) f{s) = ,s^. 
This means that in case (a) we are looking for the characteristic curves which 
pass through the points (0,s, s) in {x,y,z)-space and in case (b) we are looking 
for the characteristic curves which pass through the points (0, s, s^) in {x, y, z)- 
space. 

We first deal with case (a). The point (0, s, s) intersects the characteristic curve 
{t,Et + F,E) att = OifE = F= s. In other words, the Cauchy data pick out 
the characteristic curves 

t ^ {t, st + s, s). 

This means that (s, t) ^ {t, st + s, s) is a parametrisation of our solution surface 
by the coordinates {s,t). 

For case (b), the point (0,s,s^) intersects the characteristic curve {t,Et + F,E) 
at t = if F = s, E = s^. In other words, the Cauchy data pick out the 
characteristic curves 

t (t,s^i + s,s^). 

This means that {s, t) i— > {t, s'^t + s, s^) is a parametrisation of our solution 
surface by the coordinates {s,t). 

4.3.4 Caustics 

The problem with this method is that is produces solution surfaces which are 
parametrised by (s.t) instead of by {x,y)l It is quite possible that the surface 
S C traced out as {s,t) vary is not the graph of any function z = (j){x,y). 
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(a) 








^^^^^ 




9^ 











Figure 4.2: 

,2 



(b) 



(a) The solution surface to 



dt 



oy 



with Cauchy data </i(0, y) = 
y, plotted with the characteristic vector field. You can see the graph starting 
to bend over and become double-valued, (b) The characteristic projections of 
this solution. You can see that they are all tangent to the red curve l+4xy ~ 
and that they are starting to cross one another near that curve. 
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Indeed, we can see in Figures [4?T] and [4!2] that the Cauchy development is usually 
not a graph. Let's see what goes wrong in the previous example if we try and 
express ^ as a function of {x, y). 

In case (a), the solution surface is parametrised by 

(s, t) ^ {t, st + s, s) 

so y = z(x + 1) and z = y/{x + 1). When x = —\ this doesn't make sense. 
This manifests itself in our picture: all the characteristic projections cross at 
the point (—1,0). 

In case (b), the solution surface is parametrised by 

{s,t) ^ {t,sh + s,s^) 
so y = zx -\- \fz. If M = \/z then xiP' + u — y = and so 

z = u = 

\ 2x ) 

this doesn't make sense when 1 + A:xy < and, even when 1 + 4a;?;, gives two 
possible values for z — <f>{x,y). This manifests itself in our picture: all the 
characteristic projections live in the region 1 +Axy > and they are all tangent 
to the curve 1 + Axy = (in red in the figure) . The solution surface simply 
doesn't extend as far as 1 + 4a;?; > 0. 

The red curve in Figure [4?2| is the envelope of the characteristic curves, in other 
words it is the curve to which all of the characteristics are somewhere tangent. 
In the case of the eikonal equation of geometric optics, where characteristics 
are light trajectories, this envelope shows up as a very bright curve called the 
caustic. 

Remark 61. The equation from our example is known as Burgers's equation 
(without viscosity) and is usually written 




The interpretation is that the x-axis is filled with a fluid and the velocity of fluid 
particles at x at time t is (j3{x,t). The phenomenon of a single-valued solution 
becoming multiple-valued is known as shock-wave formation and corresponds to 
the fluid "catching up with itself" so that fluid particles with different velocities 
coexist at the same {x,t)- coordinate. 

We want to understand the caustic more generally. It is actually quite easy to 
calculate the caustic straight from the parametrisation of the solution surface 
by (s,t). 
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Definition 62. The caustic of a Cauchy development 
{s,t) ^ {u{s,t),v{s,t),w{s,t)) 
is the set of critical values of the map 

tt: R2 ^ R2, Tr{.s,t) = {u{s,t),v{s,t)), 
in other words, the projection under n of the set of points p € where 

I will quickly explain why the caustic is relevant, then we will proceed to com- 
pute it in some examples. Remember that the graph of a function : R^ — > R 
is just the surface 

{x,y) ^ {x,y,(t){x,y)). 
Our solution surfaces have the form 

(s,i) {u{s,t),v{s,t),w{s,t)) 

so that X and y are given implicitly in terms of s and t. Therefore it is not 
clear if s, t or w can be given in terms of x and y. If w cannot be expressed in 
terms of x and y then the solution surface fails to be the graph of a function; 
the surface "folds over itself" . In other words, the projection 

TT{s,t) = {u{s,t),v{s,t)) 

which forgets the value of w is not one-to-one. Along the fold, the solutions 
surface admits vertical tangencies, that is vectors tangent to the solution 
surface which point vertically upward in the 77;-dircction. So wc can detect 
the bad behaviour of our solution by looking for vertical tangencies to our 
solution surface. It turns out that this is precisely the problem of computing 
the determinant in the definition of the caustic. 

Lemma 63. If the solution surface {s,t) (u(s,t),v(s,t),w(s,t)) has a 
vertical tangency at some point (so,to) then {u{s,t),v{s,t)) is in the caustic. 

Proof. Suppose without loss of generality that sq = = 0. Consider the 
curves s i-^ {u{s,0),v{s,0),w{s,0)) and t i-^ {u{0,t),v{0,t),w{0,t)). These 
are curves on the solution surface passing through the point which has a 
vertical tangency. Their velocity vectors: 

X = {dsu{0,0),dsv%0),dsw%0)) e^ndY ={dtu{0,0),dtv{0,0),dtw{0,0)) 

span the tangent space of the solution surface at that point. In other words, 

all tangent vectors arc linear combinations AX + BY of these two. If one 
of the tangent directions AX + BY is vertical then when it is projected 
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into the (u, w)-plane it vanishes, and so the corresponding Hnear combination 
vanishes: Att{X) + Btt{Y) = (where tt is the projection tt{u, v, w) = {u, v)). 
The vectors 7r(X) and 7r(F) are therefore linearly dependent. The vectors 
7r(X) and n{Y) are the rows of the matrix 



dsU dtu 
dgV dtv 



appearing in the definition of tlic caustic (wlicre p is the point (0,0)). The 
determinant of a matrix vanishes if and only if its rows are linearly dependent, 
which proves the lemma. □ 

ExEimple 64. In our earlier example, case (a), we had 

7r(s, t) = {u{s, t), v{s, t)) = {t, st + s) 
and so the determinant of the matrix of partial derivatives is 

which vanishes when t = —1. When t = —1, we have (u(s, — 1), u(s, — 1)) = 
(—1,0) so the caustic is a single point: the point we discovered earlier as the 

intersection of all the characteristic projections. 

In case (h) we had {u{s,t),v{s,t)) = {t,s'^t + s) and so the determinant of the 
matrix of partial derivatives is 



1 

2st +1 



det I „ , , , 2 1= -2st - 1 



which vanishes when 2st = —1. This means that {x,y) is on the caustic if it 
has the form {u{—l/2t,t),v{—l/2t,t)) = {t,—l/4:t) so the caustic is the curve 
{Axy+l = 0}. 

Example 65. Consider the PDE 

. ,86 ,86 
- sm (p— + cos (p— = 1 
ox ay 

for which the characteristic vector field is 

(-sin(z),cos(z), 1). 
The solution to the characteristic system of ODE is 

{x{t),y{t),z{t)) = {xo + cost,yo + smt,t) 



(reparametrising so that z{Q) = and these characteristic curves are helices, 
spiraling upward. Suppose that the initial condition is ^(x,0) = so that the 
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Cauchy hypersurface is x(s) = (s,0) and the Cauchy data is (j){s,0) = 0. The 
characteristic curve {xq + cos t, yo + sin t, t) intersects (s, 0, 0) att = if yo ^ 
and xq = s — 1, so the solution surface is parametrised by 

(s, I— > (s — 1 + cos t, sin t, t) 

Clearly the solution is (l>{x,y) — sm~^{y) but this is not single-valued, indeed 
there are infinitely many z such that sinz — y. Moreover, the solution goes no 
further than y = 1 - this corresponds to the fact that helices start overlapping 



when t — Ti/2 (see Figure 4-3' the characteristics are segments of circles with 
centre on the x-axis). 

The matrix of first derivatives of the projection of the solution surface to the 
(x, y) -plane is 

1 — sin t 
cos t 

which has determinant cost. This vanishes precisely along t — 7r/2 as suspected, 
so the caustic is the line y — sin(7r/2) = 1. 



0.8 
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Figure 4.3: The characteristics of — sin (f. 
We see they start to overlap near y — I. 



— 1 are segments of circle. 



4.4 Fully nonlinear case 



The method works for the fully nonlinear case but the system of ODEs becomes 
yet more complicated. 

Suppose that G{x, y, u,p, q) is a function of five variables and that (j>: 

R is a function satisfying 

G {x, y, (j){x, y),dx(j){x, y),dy(l){x, y)) = 
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for all (x.y) G R^. Now let j{t) = {x{t),y{t)) be a curve and restrict to 7 
to obtain a function u{t) as usual. Let G{t) denote 

G{x{t),y{t),u{t),p{t),q{t)) 

where p{t) = da;^{x{t),y{t)), q{t) = dy(j)(x{t),y{t)). We have 

dG_dG. dG. dG. dG. dG. 
dt dx^ dy^ du^ dp^ dq^ 

and since u = + y^ = xp + yq this becomes 

dG . [dG dG \ . [dG dG \ 



dt \dx du J \ dy du 
dG . dG . 
dp dq 

This suggests a system of five coupled ODEs for the five quantities 

{x{t),y{t),u{t),p{t),q{t)): 

dx dG . [dG dG\ 



dt dp ^ \dx ^ du J 

dy_dG . __ ^dG dG 

dt dq ^ \dy ^ du 

. dG dG 

dp dq 

As usual, if we integrate this system of ODEs we will obtain a curve. Taking a 
one-parameter family of these integral curves gives a surface in (a;, y, u,p, q)- 
space and when we project to (3;, y, ?i)-space we obtain a surface which, wher- 
ever it is a graph, is the graph of a solution u = <j){x,y). 

Example 66. Let us consider the eikonal equation (in units where the speed 
of light is 1) 

2 / , \ 2 



dxj \dy^ 

(so G{x, y, u,p, q) ~ p^ + q^) which describes the time (f) taken by light emitted 
normally by some curve C C to each a point {x,y) € R^. To see that 
this description is valid, let's consider the corresponding system of ODEs: 

dx ^ . „ 

- = 2p p = 

1 = 2, ^ = 

dt 

ii = 2(p2 + g2) = 2 
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Starting from the curve C and choose the initial condition (f>\c — 0. We 
see that this determines p and q along C, namely {p,q) must be (plus or 
minus) the unit normal vector to the curve C because p^ + q^ = 1 (giving unit 

length) and because {p,q) = ^f^, and by our choice of initial condition 

the directional derivative of (f> along C vanishes, so {p, q) is normal to C . 
Now along the integral curves of the ODE, p and q do not change and hence 
X and y follow the normal line (with speed 2) and the solution to u = 2 is just 
u{t) = 2t. If one followed the normal with speed 1, we would get u{t) = t. 

This is precisely the statement that the solution of the eikonal equation is the 
time taken by light to reach {x, y) from C in a normal direction. The char- 
acteristics are straight lines normal to C . Solutions of the eikonal equation 
can be very beautiful. Fiqure \A.A\ shows some of the sinqularities developed 
by level sets of ^ correspond.^to the .nU^al condU^on L + Jl} 
(that is, the equidistants of an ellipse). 
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Chapter 5 

The heat equation and 
Laplace's equation 

5.1 The heat equation 

We want to study the evolution of a temperature distribution which is not in a 
steady state. To simplify matters we restrict to one-dimensional distributions - 
these are then represented by functions (f): [0,L] x [0,oo) R where the first 
coordinate is spatial {x) and the second is time {t) and the equation governing 
their evolution is the heat equation, a parabolic second order PDE: 

dt dx"^ ' 
5.1.1 Boundary conditions 

We need to impose boundary conditions in the x and t-directions. The t- 
boundary conditions are called initial conditions (for obvious reasons!). We 
will always impose an initial condition like 

cl>{x,0) = F{x) 

for some function F, and then seek a solution on [0, L] x [0, oo). 

There are various options for the boundary conditions in the x-direction; 

• You can specify (/)(0,t) and (j){L,t). This is called the Dirichlet problem. 

• Instead of specifying (/> along one of these boundaries you can specify 
^{0,t) or ^{L,t) (in other words the directional derivative of in a 
direction normal to the boundary). This is called a Neumann boundary 
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condition. You could also impose ^{x,0), i.e. a Neumann initial condi- 
tion! But let's not do that. 

• Mixtures of the two are perfectly OK. 

I was always confused by Neumann boundary conditions. What's the point? 
With Dirichlet it's clear what it means - like with a soap film filling in a fixed 
boundary shape. Here is a physical situation (heat) in which both are clearly 
relevant. 

• Dirichlet: The boundary points of our interval are kept at fixed tem- 
peratures Ti,T2, maybe by immersing the region in some vast bath of 
refridgerating liquid. Since our interval is one-dimensional then youVe 
got to imagine immersing it in a one-dimensional bath, so that only the 
boundary is being refridgerated... 

• Neumann: The boundary of our interval is completely insulated from the 
outside world. Insulation means that heat cannot flow into or out of the 

boundary. Since heat travels down a temperature gradient, this means 
that the directional derivative in the normal direction to the boundary 
must vanish. 



5.1.2 Strategy 

We now want to solve the heat equation with given boundary and initial con- 
ditions. Fourier solved this problem in his treatise on heat in 1822. His idea 
was: 

• Find lots of solutions to the heat equation. Make sure they fit the bound- 
ary conditions. Maybe they don't fit your initial condition, but don't 
worry. 

• Take linear superpositions of (possibly infinitely many of) these solutions 
in such a way as to fit the resulting function to the initial condition. 

This works because the heat equation is linear, so if (pi, (1)2, ■■ ■ are solutions and 

Ai, A2, . ■ . are real numbers then X]fe=i ^k4>k is also a solution (provided that 
the partial sums converge in an appropriate way (uniformly) so that derivatives 
commute with X^^i). 

5.1.3 Separation of variables 

We find lots of solutions by using a method called separation of variables: we 
seek solutions of the form 



(j){x,y) = X{x)T{t) 
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where X{x) is a function of x alone and T{t) is a function of t alone. In this 
case the heat equation becomes 

XT' = X"T. 

If X and T solve X" = \X and T' = XT then certainly XT' - X"T = XXT - 
XXT = so the problem would be solved. I didn't just pull these equations out 
of a hat, the argument usually goes like this: divide through by XT to get 

_ r _ 

for some constant A (has to be constant because X" jX depends only on X and 
T' /T depends only on T). Multiplying up gives the two equations I mentioned. 
This has the disadvantage of dividing by XY which could potentially vanish. 
In future we will not be made queasy by this division and, from now on, that's 
how we'll derive equations for separation of variables. 

Solving for T, we get either 

T{t) = Ke^* or K 

with K constant, according to whether A ^ or A = respectively. The A = 

case gives X" = and hence X = Ax + B. Since these have no time-dependence 
(because T is constant) these are called steady solutions. 

The A > solutions have T increasing exponentially in time, which seems very 
unphysical! A hot plate, left to cool, doesn't suddenly become hotter. We will 
see shortly how our boundary conditions allow us to ignore such solutions. 

We can write down the general separated solution by solving the ODEs above: 

' {Acos{x^/^) + Bsin(a;\/^)) e^* if A < 
(j)x{x,t) = <{Ax + B) ifA = 0. 

_ (Acosh(xv^) + B sinh(a;v^)) e-^* if A > 



5.1.4 Fitting to boundary conditions 

We fit the separated solutions to the boundary conditions. We specialise to 
Dirichlet conditions for simplicity. In the question sheets there will be some 
with Neumann conditions; the method is the same. 

The Dirichlet conditions we impose are 0(0, = ^o, ^{L,t) = Si. The steady 
solution 

(Po{x, t)= { — — jx + So 

obviously fits these conditions. By setting 0(x,t) = 4'{x,t) — (f)o{x,t) get a new 
solution 6 to the heat equation (because the heat equation is linear) satisfying 
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the new Dirichlet conditions 9{0, t) = = 9{L, t). In other words, we can assume 

without loss of generahty that 5*0 = = 0. 

If A > then X{x) = Acosh(a;-\/A)+i? sinh(a;-\/A) and so the condition X{0) = 
means that ^ = 0. Also, the condition X{L) = gives Ssin h{Ly/X) = 0, but 
sinh only vanishes at 0, hence B = also. Therefore we can ignore separated 
solutions with A > (as we had hoped) because we cannot fit them to our 
boundary conditions. 

If A = —p^ < then X{x) = A cos{px)+B sin(pa;) and so the condition X{0) = 

means that ^ = 0. Also, the condition X{L) = gives Bsm{pL) = 0, and now 
sin vanishes at rnr, ne{l,2,3,...}. This means we can have nontrivial solutions 
(i.e. with B ^ 0) whenever pL = nw. Thus we have separated solutions 

2 . 

0„ = e~" sin(n7ra;/L) 

for n = 1,2,3, .... 



5.1.5 Fitting to initial condition 

Finally we want to take a (possibly infinite) linear combination of separated 
solutions 0„ to fit to the initial condition 0(a;,O) = F{x). We are allowing (j)o 
too. So the aim is to write 

F{x) = — -X + ^0 + An sm{mrx/L) 

n=l 

in other words, we want to expand F{x) — (half-range) 
Fourier sine series. In other words, extend F: [0,L] ^ H to an odd function 
F: [-L,L]^K: 

\f{x) ifj;G[0,i:] 
l-F(-x) ifxe[-L,0] 



F{x) 

and take 



1 /-^ ~ 

An = Y F{x)sin{mTx/L)dx. 

^ J-L 



-L 

No wonder Fourier invented Fourier series! 

Example 67. Suppose we want to solve d(j>/dt — d'^(j>/dx'^ for x G [0, tt], subject 
to 4>{x,0) = cosx, 0(0, t) = 1, (j){'K,t) = —1. We start by setting 

g g 

(f)o{x,t) = X + 5o = 1 — ^x/tt 

TT 

and defining 9{x,t) = 4'{x,t) — (/)o(x,t). Then 9 solves the Dirichlet conditions 
0{x,O) =cosx- l + 2x/n, 9{0,t) = = 0{Tr,t). 
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The solution to this problem is 

e{x,t) = ^ A„e^"'*sin(nx) 

n=l 

where An is the nth half-range sinusoidal Fourier coefficient of 

F{x) = cos a; — 1 + 2x/'jt 

which is 

2 ((-!)" + 1) 
TT n(n2 — 1) 

so that 

a;,t =1 + - TT^e * sm nx . 

TT TT n — 1 

n=l ^ ' 

5.1.6 The smoothing effect of paraboUcity 

In the limit as t ^ oo all of the sinusoidal oscillations decay exponentially and 
we are left with the steady solution Aqx + Bq. The Fourier modes with large 
n decay at a faster rate e~" *. This makes physical sense: heat moves down 
a temperature gradient to even out the temperature distribution. The more 
gradient there is, the faster the heat moves. The higher n is, the wigglier the 
Fourier mode, the steeper the gradient, the faster the heat moves to smooth out 
the temperature distribution. This behaviour is typical of parabolic PDEs. 



5.2 Laplace's equation 
5.2.1 Hcirmonic functions 

The Laplace operator for functions of n variables xi,. . . ,Xn is 

^^dx^- 

k=l 

Recall (we only wrote it out for n = 2) that this arises as the Euler-Lagrange 
equation for functions defined on a region U c R" with specified boundary 
values, extremising the functional 



L 



known as the Dirichlet energy. Functions which are annihilated by A are called 
harmonic functions. The problem of finding a harmonic function on a given 
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region with specified boundary values is called Dirichlet's problem. The equation 
A(j) = satisfied by a harmonic function is called Laplace's equation; it is a linear 
elliptic second-order equation with constant coefficients. 

Example 68. Consider harmonic functions on the interval [0, 1] . Since these 

satisfy d? f / dx^ = they are just functions of the form f{x) = ax + b, whose 
graphs are straight line segments from (0,6) to (1, a + 6). 

This example displays behaviour typical of solutions to elliptic problems. 

• After fixing the boundary conditions (in this case fixing h and a + b) there 
is only a very limited set of solutions (in this case a unique one). 

• The solution is heavily dependent on the boundary conditions: if you 
change the boundary data locally (i.e. at just one of the boundary points) , 
the solution is altered everywhere (i.e. arbitrarily close to the other bound- 
ary). This is in contrast with the wave equation; we will see that it takes 
time for changes in initial conditions to propagate (at the 'speed of light'). 

• The solution displays the following mean value property: \ {f{x) + f{y)) = 
f (^Y^) • Note that the solution is actually linear, but in higher dimensions 
it's only a weaker mean value property which holds, namely that the value 
of a harmonic function at the centre of a disc D cU is equal to the integral 
of the function around the boundary of the disc (rescaled by the reciprocal 
of the circumference of the disc). 



5.2.2 The maximum principle 

We alluded above to the mean value property: 

Theorem 69 (Mean value property of harmonic functions). If f : U ^ H is a 
harmonic function on a domain U C and D C U is a disc centred at x of 
radius r then 

g^nx + rf)rdl 
iTrdB 

which is the analogue of the property 

/(^)=^(/(^) + /(2/)) 

for harmonic functions on an interval. We won't prove this, but note that it 
bears a marked (and non-coincidental) resemblance to Cauchy's integral formula 
in complex analysis! This formula has the consequence that the maxima and 
minima of a harmonic function are achieved along the boundary of its domain. 

Corollary 70 (Maximum principle). // /: C/ ^ R zs a harmonic function on 
a domain [/ C R^ then the maximum and minimum of f are achieved along the 
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boundary of U. If the maximum or minimum is also achieved in the interior of 
U then f is constant. 



Proof. Suppose that the niaxiniuni is achieved somewhere (say x) in the interior 
and that the function is not constant. Then there exists a point y such that 
f{y) < f{x). Let C/i, . . . , be a sequence of discs such that x is at the centre 
of J7i, y is on the boundary of J7„ and the boundary of passes through the 
centre of Uk-i (we are making some topological assumptions about the domain 
U, like the fact that any two points in U can be connected by such a sequence 
of discs - that's a fairly mild condition). 

Now let Zk be the point at the centre of Uk (so x = zi), let Zn+i = y, and let 
Tfe be the radius of Uk. Since f{p) < f{x) for all p G C/ we have 

feuji^ + ne'yide /g^^ f{x)r,d9 ^ 

lau.nde - J,^j,d0 ^^^^ 

with equality if and only if f{p) = f{x) for all p E Ui. By the mean value 
theorem, equality holds, so/(2;i) = f{x). Applying the same argument again 
and again we eventually get to f{y) = f{x), which is a contradiction. □ 

Corollary 71 (Uniqueness of solutions to Dirichlet problem). Suppose that 
(j)i,(j)2- U R are harmonic functions on a domain U C and (f)i\du = 
4'2\du- Then 0i = (j)2. 



Proof. Take 6 = 4>i — 4>2- By linearity of the Laplace equation, = 0. More- 
over, 9\qjj = 0. By the maximum principle, the maximum and minimum of 
9 are achieved along dU, where it vanishes, hence it vanishes identically and 

= (j)2. □ 



5.2.3 Steady temperature distributions 

Suppose that U C R" is a region in n-dimensional Euclidean space (we're 
mostly interested in n = 1,2) and that </>: R" x R ^ R is a time-dependent 

temperature distribution (also written (f>{x, t), where .t G t/ is a spatial variable 
and f e R represents time). Then (j) evolves according to the heat equation 

We see that harmonic functions correspond to steady temperature distributions, 
that is distributions which don't change over time. From the variational inter- 
pretation of Laplace's equation, these are the distributions which minimise the 
total temperature gradient (unsurprising, given that heat flows down a tem- 
perature gradient (from hot to cold) and tries to homogenise the temperature, 
subject to the boundary conditions). 
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5.2.4 Solution on a rectangular domain 

We will now seek to solve the Laplace equation on a rectangular domain U — 
[0, a] X [0, 6] C R^. That is we need to find a harmonic function on the rectangle 
with arbitrary boundary data. Our strategy is the same as for the heat equation: 

• Find lots of solutions to the Laplace equation. Maybe they don't fit your 
boundary values, but don't worry. 

• Take linear superpositions of (possibly infinitely many of) these solutions 
in such a way as to fit the resulting function to the boundary data. 

This works because the Laplace equation is linear, so if , 4)2 , ■ ■ ■ are solutions 
and Ai, A2, ■ ■ ■ are real numbers then X^fe^i -^k^k is also a solution (provided 
that the partial sums converge in an appropriate way (uniformly) so that A 
commutes with X^fcLi)- 



Separation of variables 

We find lots of solutions by using a method called separation of variables: we 
seek solutions of the form 

4^{x,y)=X{x)Y{y) 

where X{x) is a function of x alone and F(y) is a function of y alone. Laplace's 
equation becomes 

X"{x)Y{y)+X{x)Y"{y) = Q (5.1) 

where X" (respectively Y") denotes the ordinary second derivative of X (re- 
spectively Y) as a function of one variable. Divide (5.1 ) by XY and we get 

]r_ _ _Y^ 
~x ^ 

The left-hand side depends only on x, the right-hand side only on y and by 
equality of the two sides, both expressions must be constant, equal to A. This 
gives the two equations X" = \X, Y" = —XY. 

There are now three cases: 

• A = 0: In this case X" = so X = Ax + B, Y" = so Y = Cy + D. 

• A > 0: In this case y is a simple harmonic oscillator so 

r = Ccos (yVxj +Dsm (yVx'j 

and 

X = A cosh (x^/x] -|- Bsinh (xVx 
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A < 0: In this case, X and Y axe reversed {X becomes trigonometric, Y 
becomes hyperbohc): 

Y = C cosh + D sinh (yV^^ 

and 

X = Acqs (^xV^^ + B sin {x^/^-X^ . 



Boundciry conditions 

We have various possible boundary conditions we can apply along the edges of 
our rectangle. 

• Dirichlet boundary conditions: where we fix the values of (/>(a;,0), 
<t>{x,b), (p{-a,y) or (j){a,y). 

• Neumann boundary conditions: where we fix the values of ^ along 
a vertical boundary or ^ along a horizontal boundary. More generally, 
we would fix the directional derivative of </> in a direction normal to the 

boundary. 

We can either have pure Dirichlet conditions, pure Neumann conditions or a 
mixture (called mixed boundary conditions). 



Fitting solutions to boundciry data 

Now let's examine the various solutions we found by separation of variables and 

see what kinds of boundary condition wc can satisfy. Wc will concentrate on 
Dirichlet. We denote by U the rectangular region [0,a] x [0,6] and by dU its 
boundary. 



Fitting to corners 

We impose 

(j>{x,0) = Fi{x) =F2(x) 

0(0,1/) = F3(y) ^{a,y)=F4{y) 

Let's denote the corner values by: 

5oo = Fi(0)-F3(0) Sio = Fi{a) ^ F^iO) 

Sai = F2{0) = Fs{b) Sii = F2{a) = F^{b) 

and let's start by fitting the A = solutions 

Mx, y) = {Ax + B){Cy + D)=BD + ADx + BCy + ACxy 
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to this corner data. The four corner values give us four equations for A, B, C, D 
which we can solve and we get 

, / N a , i^W ~ ^Oo) , {^01 — Sqo) {Sii — Sio — Sqi + Soq) 

(j)o{x, y) = Soo + X h y ^ h xy — 

You can check that this satisfies the corner data. Now define 

9{x,y) = (t>{x,y) - (t)o{x,y) 

and 6 is a solution to Laplace's equation which solves some new boundary con- 
ditions, but crucially = at all the corners of U . 

In other words, without loss of generality, our solution vanishes at the corners 
of U. 



Patching solutions 

We've seen that we can modify Fi, F2, F3 and F4 to F^^"^{x) = Fi{x)-(l)o{x, 0), 
i^ew(^) = i^c^fe) = Fs{x)-MO,y) and F^^^^iy) = 

<j)o{a,y) which vanish at the corners. So let's split our problem into four pieces: 
let's look for solutions 6'i, ^2, ^3, 04 such that: 

ei{x,o) = F^'''^{x) eiix,b) = o 

Oi{0,y)=0 0,ia,y)=0 



and 



92{x,0)=0 e2{x,b)^F^'''^{x) 
92{0,y)= 02{a,y) = 



etc. Note that we can only do this because F^^"^{0) = Ff^^(a) = 0, etc. 
otherwise the boundary conditions are discontinuous at the corners. 

If we can find ^i, $2, etc. then ^1 + ^2 + ^3 + ^4 is still a solution to Laplace's 

equation and satifies all the boundary conditions we wanted originally. I'll show 
you how to solve for 0i: the other solutions can all be obtained the same way, 
or by being clever with symmetries of the rectangle. 



Solve for 61 

61 will be written as a (possibly infinite) linear combination of separated solu- 
tions. We want these separated solutions individually to satisfy X{0) = X{a) = 
to ensure that ^i(0, y) = = 6i{a, y). 
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If A > then X{x) = Acosh(.T\/A) + Bsmh{x^/\). The condition X{0) = 
means that A = and the condition X{L) = means that B = 0. So there are 
no separated solutions with A > that work. 

If A < then X{x) = Acos(x\/^) + Bsin{xy/^). The condition X{0) = 

means that ^ = and the condition X{L) = means that Bsm{a\/—X) = 
which means that if -B ^ we need a^—X = nn for some n G Z. Thus 
A = -nV/a^. 

We therefore want a combination 

oo 

9i{x,y) = (C„ sinh(n7ry/a) + -D„ cosh(n7ry/a)) sin(n7ra;/a) 

n=l 

which fits the final two boundary conditions 

ei{x,o) = Fi{x), ei{x,b) = o. 

These give us 

oo 

Fi{x) = Dn sin(n7ra; / a) 

n=l 

so that Dn are the coefficients in the Fourier half-range sine series of Fi (x) and 

oo 

= y^(Cn sinh(n7r6/a) + £>„cosh(n7r6/a)) sin(n7rx/o) 

n=l 

which means 

C„ sinh(n7r6/a) + I>„cosh(n7r6/a) = 

or 

Cn = — -D„coth(n7r6/a). 

Example 72. If F\{x) = sin(a;) on [0,a = tt] x [0,6 = 57r] then Di = 1 and 
Ci = coth(57r) so the solution is 

6i{x, y) = (— coth(57r) sinhy + coshy) sin a;. 



Finding the other 9i 

You could of course plough through this procedure for each 9. For instance, ^2 
is particularly easy: we need to solve 

00 

= ^ Dn sin(n7ra;/a), 

whch gives £>„ = 0, and 

00 

F^^'^ix) = ^ C„ sinh(n7r&/a) sin(n7ra;/a) 
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so that C„ is 1/ sinh(n7r6/a) times the nth Fourier coefficient of F2^^{x). 

Solving for ds and 64 means you need the A > sohitions so that Y is trigono- 
metric, but otherwise the sohitions are found exactly the same way. 

Sometimes, using a bit of geometric intuition (e.g. reflecting the rectangle) will 
allow you to guess the solution. 



5.2.5 Some examples 

Example 73. Here is an example. On the square [0,7r]^, take Fi{x) = sin(a;), 
F2{x) = 0, Fsiy) = 0, F^ix) = sin(y). 

We get 9i{x,y) = (— coth(7r) sinhy + cosh?/) sina; as in the previous example. 
However, we could rearrange this using hyperbolic trigonometric identities as 
'^'"^ sTnifi-'^'"'^ • -^Z" ^''^^ instead begun with F2{x) = sin(x), -^1(2^) = 0, the 
solution would have been ^"3;^^ ■ C'^e'^^^y these two solutions are related by 
y ^ -K — y, which is a reflection in the horizontal line which cuts the rectangle 
in two. 

We also get 6i{x,y) = ^^^^3^. So the full solution is 

sinh X sin y sinhfTr — y) sin x 

Hx,y) = ■ , + — • 

sinh TT smh tt 

Example 74. On the square [0,7r]^, take Fi{x) = 0, F2{x) = x^ , Fz{y) = 0, 

FA{y)^v^. 

We start by finding a A = solution to fit the comer data. Consider the function 
xy. This takes on the right values at the comers of our square (i.e. the values 
0,0,0,7r^. If (j) solves our problem then 0{x,y) = <p{x,y) — xy solves Laplace's 
equation with boundary values 

F^^ix) = 0, Fr'^ix) = x'- nx, F^) = 0, F^) = - ny 

(in particular, the boundary values are zero at the corners). The sinusoidal 
Fourier series ofx^—nx on [0, tt] (i.e. the half-range series obtained by extending 
it to an odd function on [— tt, tt]^ is 

_8 ^ s\n{{2k-l)x) 

(2fc-l)3 

and because the values in the comers are now zero we can apply our earlier 
solution 

8 ,22, /sin((2/c - l)x) sinh((2fc - l)y) sin((2fc - l)y) sinh((2A: - l)x) \ 
(2/0- l)3sinh((2fc- l)7r) ^ (2A; - 1)3 sinh((2A; - l)7r) ) 
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To get (p rather than ^{x, y) — xy we need to add xy so the final answer is 

8 ,22, /sin((2fc - l)a;) sinh((2/c - l)y) sin((2fc - l)y) sinh((2fc - l)x) ' 
~ ^ ^ (2A;- l)3sinh((2A;- l)7r) ^ (2fc - 1)3 sinh((2fc - l)7r) 



5.3 Eigenfunction expansions 



This section is nonexaminable. It attempts to answer the question "Why is 

sin(n7rx/L) turning up all over the place?". 

Consider the vector space Y of functions [0, tt] ^ R and the subspace 

X = {F: [0,n] R : F(0) = F(7r) = 0} 

of functions on [0, n] which vanish at and it. We can define a linear operator 




which takes F to the second derivative of F. Note that F"{0) and F"{tt) 
need not vanish! Nonetheless, we can ask for eigenvalues and eigenvectors 
(eigenfunctions) of d^/dx'^, that is functions F G X such that 

F" = d^F/dx"^ = XF 

for some A e R. First notice that 

pTT nTT pTT 

/ XF^dx = / FF"dx = - / F'F'dx < 
JO Jo Jo 

by integrating by parts. This implies A < 0. The equation F" = XF has 
solutions Asm{x^—X) + B cos{x^—X) and these vanish at and n if and 
only if B = and \/— A = rnr for some n G Z. 

Now consider the heat equation in three variables 

d(t>/dt = 

where ^{t, x, y) is now a function of three variables and A is the Laplacian. 
Seeking separated solutions (j) = T{t)M{x,y) gives 

T'/T = A{M)/M = X 

where A is constant and hence T = e^*, A(M) = AM. So what are the 
eigenfunctions of A? Well we worked them out on Sheet 7. As usual there's 
a discrete set of them M„ corresponding to a discrete collection of eigenvalues 
A„. To replace the usual Fourier expansion of a function F in terms of sines, 
there is now an eigenfunction expansion of F as a sum of eigenfunctions of 
M„: 

F{x,y) = Y,^nMn{x,y) 

n 
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and A{F) ~ yl„A„M„(a::, j/). Thus a solution to the heat equation with 
initial conditions (j){0,x,y) = M{x,y) is given by 

n 

and the nth mode of F decays in time with rate A„. 

Returning to 1-dimensional problems, we might try to solve the equation 

(j)t = f{x)(j)xx + g{x)4>x + h{x)(j) 

by separating variables (j){x,t) = T{t)X{x). The result is 

T'/T = f{x)X"/X + g{x)X'/X + h{x) = A 

where A is constant. Solving problems like 

f{x)X" + g{x)X' + h{x)X = \X 

is the subject of Sturm-Liouville theory You've already met an equation like 
this on Sheet 4, namely Legendre's equation 

X"{1 - a;^) - 2xX' = -£{i + 1)X 

where A = £(£ + 1) for £ G Z are the eigenvalues. 

The Sturm-Liouville theory gives conditions on f,g,h which ensure the ex- 
istence of an infinite discrete set of eigenvalues A„ with eigenfunctions y„. 
The key properties which let us mimic Fourier theory are: 

yn{x)ym{x)dx = Smn 

(orthogonality) like the integrals of sine and cosine which let us compute 
Fourier series and completeness, which asserts the existence of an expansion 

^(^) = ^Ar,yn{x) 

n 

for any (reasonable, e.g. square-integrable) function. This orthogonality can 
be understood as an infinite-dimensional analogue of the following theorem 

from linear algebra: 

Theorem 75. Suppose A is a symmetric matrix. Ifv and w are eigenvectors 
of A associated to different eigenvalues A ^ /U then v -w = 0. 

Proof. Consider 

V ■ Aw = (A'^v) ■ w 
= (Av) ■ w 
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Since is a /x-eigenvector of A and w is a A-eigenvector of A this equation 

becomes 

IJ.{v ■ w) = X{v ■ w) 

If X this impUes v ■ w = 0. □ 

The analogue of the dot product • for the space of functions on [0, tt] is the 
integral 

"/•5"= r f{x)g{x)dx 
Jo 

In function space, A is replaced by an appropriate operator like cP/dx^. The 
analogue of the equation 

V ■ Aw = {Av) ■ w 
is obtained by integrating by parts twice: 

(Note that we need /(O) = /(tt) = g(0) = g(7r) = or other suitable bound- 
ary conditions for this integration to work without picking up boundary 
terms). 

A better name for "symmetric" in this context is "self-adjoint": an operator 
has an adjoint operator defined by 

{^A^v) ■ w = V ■ Aw 

and self-adjointness means that A = . 



5.4 Discontinuities 

As a final example, I want to tackle the problem of what happens when your 
equation becomes discontinuous. For example, maybe you're interest(^d in the 
heat equation on a rod [0, 27r] whose thermal diffusivity changes dramatically 
halfway along and you model this by using the heat equation 

dt dx"^ 
where a is the discontinuous function 

J cTi if a; e [0, tt] 
a[x) = < 

\a2 if a; e [tt, 2it\ 

Let's figure out what the separated solutions are when cti = 1 and cr2 = 2 (so 
the right-hand half of the rod conducts heat more quickly than the left-hand 
half). 
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Wc impose the boundary conditions (f>{0,t) = (f){2Tr,t) = 0. Let's not impose 
an initial condition and just look for the separated solutions. We will also need 
extra "boundary conditions" at the discontinuity, namely we will require our 
separated solutions and their first derivatives to be continuous at tt. 



Let's write 



(l)i{x,t) = Xi{x)Ti{t) onxG [0,tt] 
h{x,t) = X2{x)T2{t) onxe [Tr,2n] 



On [0, tt] we get T{ = ATi, = XXi as usual. On [tt, 27r] we get = AT2 and 
X2 = 2XX2. Therefore the separated solutions we seek are 



Xi{x) = Aisin(a;\/^)+Bicos(a;-/^), X2{x) = A2 sm{x^/-X/2)+B2 cos(a;-\/-A/2) 

The boundary condition ^{0,t) = becomes 

Xi{0) = => Bi=0 

and the boundary condition (j){2TT, t) = becomes 

X2{2it) = ^ A2 sin(27rv/-A/2) + B2 cos,{2-k^/ -X/2) = 

We have four constants left ^1,^2, .82, A and only one equation connecting 
them. We need to impose continuity of X and X' at a; = tt to fix these extra 

constants. 



We have 



so 



implies 
and 



so 



Xi{tt) = Ai sin(7r\/^) 

X2(7r)^2 sin(7rV-A/2) + B2 cos(7rV-A/2) 

Xi(7r) =^2(77) 
Ai sin(7r\/^) = A2 sm{w^/-X/2) + B2 cos(7r-v/-A/2) 

X[{Tr) = V^Ai cos(7r\/^) 

X'2{^^)^/^X/2(^A2 cos(7rV-A/2) - sin(7rV-A/2)) 
X[{n) = X'2{7r) 



implies 



-XAi cos(7ry^) = y/-X/2 (^A2 cos (tt^/- A/2) - B2 sin(7rv'-A/2)) 
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We already know that 

A2 sin(27r^-A/2) = -B2 cos(27r V'-A/2) 
i.e. B2 — tan(27r-\/— A/2), so the two new equations give 

Ax sin(7r\/^) = A2 {sm{'K\J -Xjl) - tan(27rv/-A/2) cos(7rv/-A/2)^ 
Ai\/^cos(7r\/^) = V'-V2^2 (cos(7rv/-A/2) + tan(27rv/-A/2) sin(7rv/-A/2) j 
Dividing the first by the second equation gives 



: tanf 7r> 



-A) 



so 



sin(7r^-A/2) - tan(27r\/-A/2) cos(7r^-A/2) 
^-A/2 (^cos(7r^-A/2) + tan(27rY/-A/2) sin(7r^-A/2) 

tan(7r^-A/2) - tan(27r^-A/2) 



tanfvrV— A) = \/2 , , 

1 + tan(27r^-A/2) tan(7r^-A/2) 

= -\/2 tan (^7r^-A/2^ 

So A has to satisfy this bizarre equation! If we plot the graphs of tan(7r\/— A) and 
— \/2tan |^7r-\/— A/2^ as functions of A, see Figure 5.1 we see that they intersect 
at a discrete set of points with abscissa A„. Therefore we get separated solutions 
with these A„ and the other constants can be calculated from the relations we 
have already found. 

This ceases to be something we can do explicitly because there is no closed form 
expression for the numbers A„. 
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Figure 5.1: The graphs of tan(7r\/— A) and — \/2tan ^tt-^/— A/2j as functions of 

A. The x-coordinates of the (discrete set of) intersection points are the numbers 
we are caUing A„. 



Chapter 6 



The wave equation 



6.1 Derivation 

Let's consider a uniform one-dimensional string stretched out straight between 
two points (0,0) and (-£/, 0). Make a small perturbation of the string so that it 
follows the graph of a function u: [0, L] ^ H with u{0) = u{L) = 0. We will let 
go of the string and allow it to vibrate. At time t and above the point x G [0, L] 
the height of the string will be ^{x,t). Here's a claim which we won't justify, 
but which should be intuitively appealing because of Hooke's law which relates 
tension and lengthening: 

Claim 76. Assume that the perturbation 4> is small and assume the same of its 
derivatives. The potential energy in this perturbed string is (to a good approx- 
imation) proportional the difference between its length and the length L of the 
unperturbed string, i.e. equal to 



where t is the tension per unit of lengthening. To lowest order in f (by Taylor 
expanding the integrand) this is approximately 



Moreover, the total kinetic energy of the string, once it is in motion, is to a 
good approximation 



where p is the density in units of mass per unit length ( constant by the assump- 
tion of uniformness) . 
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We saw on Problem Sheet 5 that, in the Lagrangian formulation of mechanics, 
equations of motion can be derived by minimising the Lagrangian given by the 
difference between kinetic and potential energy. Using this principle, we will 
derive the equation of motion for the string by extremising the functional 

The (two-variable) Euler-Lagrange equation for this functional is 
or 

1 d^<j) _ d^(j) 

where c = \/t/ p. We will see that c has the interpretation of speed for the 
waves of vertical disturbance that start to move up and down the string. 



6.2 Boundary conditions 

We have already seen how to solve the wave equation (and more generally linear 



hyperbolic equations with constant coefficients) in Section 3.3.2 The general 
solution is 

0(x, t) = F{x + ct) + G{x - ct) 

for some arbitrary functions F and G. The only remaining task is to compute 
F and G from the initial data of the string at time t — Q. Since there are 
two arbitrary functions F and G (or because the wave equation has second 
order) we need to know more than just the initial profile u of the string. We 
need extra boundary or initial conditions. Above, we assumed fixed endpoints 
0(0, t) = 0(L, t) — 0, but there are other possibilities: 

• Maybe the string is infinitely long, but we know v{x) = ^(x, 0). If we 
were just letting go of the string then it would be stationary at the instant 
we let it go (though it would immediately start to accelerate). In other 
words we would have t; = 0. However, we allow ourselves more general 
functions v. 

• Maybe the string extends infinitely far in only one direction (i.e. along 
the interval [0, cx)), but is fixed so that 0(0, t) = 0. 

More generally there are all sorts of boundary conditions you could stick on 
your string. We'll just concern ourselves with the fixed endpoints and infinite 
string settings. 
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6.2.1 The bounded string: Fourier's solution 

We deal first with the case of a string stretched between and L and fixed at 
height zero. That is we impose the boundary conditions 

<i>{0,t) = cP{L,t) = 0. 

We will seek solutions to the wave equation by separating variables, that is look- 
ing for solutions 4>{x,t) = X{x)T{t). Substituting this into the wave equation 
gives: 

X"T = ^XT" 

and so we only need to solve simple harmonic oscillator equations 

X" = XX 
T" = c^XT 

Claim 77. There are no nontrivial solutions satisfying the boundary conditions 
ifX>0. 

Proof. When A = wc have X" = which means X = Ax + B. For this 
to vanish at x = 0,L we need A = B = 0. Similarly if A > then X = 
Acosh.{x^/X) + Bsinh(a;\/A); setting x = Q,L and requiring (j){Q,t) and (j>{L,t) 
to vanish implies 

Acosh(0) + Bsinh(0) =0 A = 

and 

Bsinh(i\/A) = S = 0. 
Hence A = B = Q and the solution is trivial. □ 

In the remaining case A < there are many solutions when \/X = kn/L as usual 

X{x) = s\n{kTrx/L), T{t) = Ck cos{kcTrt/L) + sm{kc'Kt/L). 



Because the equation is second-order in time, we need to specify both /(x) 

dt ' 



(a;, 0) and g{x) = ^(x, 0). Suppose that 



f{x) = ^ i^fe sin(fc7ra;/L) 
fe=i 

oo 

g{x) = Gk sm{k'7rx/L) 



fe=i 
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are the half-range sinusoidal Fourier series for / and g. Then if 

oo 

(j){x,t) = ^^(C/c cos(A;c7rt/iy) + DkSm.{kcTrt/L))sin{mrx/L) 
fc=i 

we see that f{x) = 4>{x,0) implies 

oo oo 

Fk sm{knx/L) = Ck sm{mrx/L) 

fe=l k=l 

and g{x) = d(()/dt{x,0) implies 

oo OC' 

Gfc sin{kTrx/L) = — ^^^'^ sin{mrx/L) 
fe=i fe=i 

so Cfe = F/c and = LGk/knc. The solution is therefore 

(j>{x,t) = iFkCos{kc7rt/L) + — — sm{kcnt/L)j sm.{k-Kx/ L). 

fe=l ^ TTC / 

This has an obvious oscillatory behaviour in time which is characteristic of solu- 
tions to the wave equation, unlike the heat equation where oscillatory behaviour 
is supressed exponentially. Note that a Fourier mode with higher spatial fre- 
quency has correspondingly higher temporal frequency (this is because X and 
T satisfy simple harmonic motion with constants differing only by a factor of 
l/c^). 

The different summands are called the normal modes of vibration: fc = 1 is 
called the fundamental mode, fc = 2 is called the second harmonic, fc = 3 the 
third harmonic, etc. Our claim about frequency means that a string vibrating 
in its fcth mode will move up and down with frequency kuc/L. 

Example 78. Let's do the example of a plucked string, where the initial con- 

I mx if X ^ i/ /2 

dition is fix) = < ~ and gix) = 0. The half-range 

\mL/2 — mx ifx>L/2 



Fourier sine series of f{x) is 

E°° 2mL / 2 , , , cos(;)7r) — (:os(;/7r/2)\ fm:x\ 
— (- sm(W2) + ' \ ' ' ' ) - (^) 

and the Fourier expansion of G vanishes, so the solution is 

2mL ( 2 . , cos(mr) — cost HIT / 2) \ , /mTX\ / mvct\ 

^(-'*) = E — (- -(-/2) + ' ^ , ' ^ ^ ) - (^) cos j . 

n=l ^ y \ y 
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6.2.2 The infinite string: D'Alembert's solution 

In this section wc will deal with the case of an infinite string, specifying u(x) — 
(j){x, 0) and v{x) = ^(a;, 0). We have already seen how to solve the wave equa- 
tion (and more generally linear hyperbolic equations with constant coefficients) 
in Section [3.3.21 

We know that 

(j>{x,Q) = u{x) = F{x) + G{x). 
Since (^{x, t) = F{x + ct) + G{x — ct) we have 

u{x) = F{x) + G{x) 
v{x) = c{F'{x) - G'{x)) 

and so by the fundamental theorem of calculus 

v{s)ds = c{F{x) - G{x)) + K 





for some constant K, and hence 



F{x) = ^ [u{x) + \\ j v{s)ds - K 
G{x) = ^ (u{x) ~\[ I v{s)ds - K 







which gives the general solution 

<l){x, t) = F{x + ct) + G{x - ct) 

1 1 / i'X-\-ct i>x—ct\ 

^ - {u{x + ct) + u{x ~ ct)) + — (j -J jv{s)ds 

1 1 r^+ct 

= - (uix + ct) + u(x — ct)) + — v(s)ds 

2 2c J^_^t 

due to D'Alembert. 
6.2.3 Example 

For a similar example worlced out from first principles, see Section [3.3.2| 
Example 79. Suppose that (j)(x,0) = sin a; and d<j)/dt{x,Q) — cosx. Then 



1 1 



X, t) = -{sin{x + ct) + sm{x — ct)) + — I cos^d^ 



x+ct 



x—ct 



2^ ' ' ' " 2c 

-(sin(a; + ct) + sm{x — ct)) -\ (sin(a; + ct) — sin(x — ct)) 

2 2c 

^(1 + 1/c) sin(x + ct) + i(l - 1/c) sin(a; - ct). 



no 
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6.2.4 Propagating signals 

Let's try to understand d'Alembert's solution a little better. Assume for the 
moment that u = so that the solution is 

(j){x, ^) ~ 2 (""(^ ~^ '^^) ~^ ""(^ ~ ^^)) ■ 

At each time t this is a superposition of two terms. Both terms have the same 
profile (shape), namely the profile of u (scaled down by a factor of two). As time 
progresses one of these profiles seems to move to the right {u{x — ct), because its 
argument is decreasing as t increases) and the other {u{x + ct)) seems to move 
to the left. For instance, in Example 79 the crests of the -^-^j^ sin(.x + ct) wave 



occur at 

X + ct ^ 2Tm + 7r/2 

Let Xn{t) denote the position of the nth crest at time t. This gives 

Xn{t) = 27rn + it/2 — ct 

so as t increases, Xn{t) decreases and the crest moves to the left. We therefore 
call this a left- moving wave and the other a right-moving wave. 

Example 80. Suppose that 



u{x) 



1 if \x\ < 1 
otherwise. 



Then two little square waves move off in either direction. Notice that an observer 
standing k metres to the left will only notice the arrival of the square wave after 
k/c seconds. In other words, signals propagate at a speed of precisely c. 

We represent this causal relationship diagrammatically by drawing the {x, t)- 



plane (called a spacetime diagram). See Figure 6.1 In the pictures, c = 1 so 
that the slope of a line traced out by a wave front leaving the point x = 
at time i = is L We'll talk about the waves as light waves (just to aid the 
imagination) . The lower part of the figure shows as dotted lines the light rays 
emitted forward in time from a point situated at the origin and the light which 
arrives at that point from the past. The forward light cone is the set of all points 
which could be reached from this point by travelling along light rays (maybe 
alternating between left- or right- moving rays by a cunning use of mirrors) . On 
the other hand, light can never reach the regions x > t because it would have 
to travel faster than light to get there. The backward light-cone is the set of all 
points from which light could conceivably reach the origin via a cunning system 
of mirrors. 

In the upper part of the diagram you can see the support of the two square 
waves propagating to the left and right. The support of a function is the set of 
points where it's non-zero. The interval [—1, 1] at time is the initial support of 
the square wave. As time progresses, the supports of the left- and right-moving 
square-waves move left and right along the dotted lines. 
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Figure 6.1: A spacetime diagram showing light rays (moving at speed c = 1) 
emitted from a point at the origin (lower part) and showing the motion of a 
left- and right-moving square wave (upper part). 



6.2.5 Comparison of Fourier and D'Alembert solutions 

We now have two ways of solving the wave equation: by separating variables a 
la Fourier and by a clever change of coordinates due to D'Alembert. It is easier 
to apply the separated variables method to boundary-value problems because 
it is not so easy to express the boundary conditions in terms of the arbitrary 
functions F and G from D'Alembert's solution. Of course the two solutions are 
related by a standard trigonometric identity: 
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i.e. a separated solution can be seen as a superposition of right- and left-moving 
waves. 

6.3 Some simple waves on an infinite string 
6.3.1 Simple waves on an infinite string 

One particularly simple wave which we're allowed on an infinite string is the 
simple harmonic wave 



which we could re-express in terms of sin(A;(a; — ct)) and cos(fc(a; — ct)) if we were 
so inclined. The maximal height a is the amplitude, k is the angular wavenumber, 
A = 27r/fc is the wavelength, w = fee is the angular frequency and T = 2Tr/uJ is 
the period. Changing the constant B is called shifting the phase of the wave. 
This wave is moving to the left (as time increases, a; — is decreasing) and you 
could create a right-moving wave by using x + ct. 

A standing wave, which, as the name suggests, doesn't move, can be obtained by 
adding a left-moving and a right-moving simple harmonic wave with the same 
amplitude and frequency 

(j){x, t) = Acos{k{x — ct)) + Acos{k{x + ct)) = 2Acos{kx) cos{kct) 
Another way to write the simple harmonic wave is as 



6.3.2 Reflection and transmission coeflicients 

Suppose that there are two strings tied together at the origin and extending 
off to infinity in either direction. Suppose that waves travel at speed c_ in the 
left-hand string and c+ in the right-hand string. Imagine a simple harmonic 
wave incoming from the left 



and when it hits the discontinuity some of it is reflected and some of it is 
transmitted, resulting in a composition of three waves: incoming, reflected and 
transmitted, with amplitudes A, R, T and frequencies k, i, m. In other words, 
the resulting wave has the form: 



0(a;, t) = Acos{k{x - ct) + B) 



(l>{x, t) = Re (Ae'^ ex.p{ik{x - ct))) 



X — C—t) 



x <0 




when a; < 



when a; > 
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Note that the reflected term is right-moving! This is certainly a sohition to 
the wave equation away from a; = 0: it's a linear superposition of solutions 
and the equation is linear. At a; = it's not clear that the wave equation 
even makes sense because (j) might not be twice differentiable there. However, 
in physics people often make such guesses, claiming that the equation itself is 
discontinuous at the origin and therefore it doesn't matter that the solution 
isn't twice differentiable there: we simply don't care what's going on. There's 
some physical process going on which we don't understand and which reflects 
and transmits the waves, all we can do is observe the reflected and transmitted 
waves as they travel out, away from the origin, to regions where we understand 
the wave equation! This is characteristic of scattering calculations. 

Despite our ignorance, we will try to calculate R, T, £ and m in terms of A and 
k. To do this we need to impose some assumptions about ^ at a; = 0: namely, 
we want ^ and its first a;-derivative to be continuous at a; = 0. Then 

and 

Since e™* are linearly independent in the space of all complex- valued functions 
we need kc- = —£c- = mc+, that is 

i = —k, m = kc-/c+. 

The two equations become 

A + R = T 

and 

k{A-R) = kTc-/c+ 

which we can solve to get 

j,^ 2>lc+ ^ ^ A{c+ - C-) 

c_ + c+ ' c_ + c+ ' 
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Appendices 



115 



Appendix A 

Recap of Fourier theory 



A.l Fourier series 



Let L G (0, oo) be a positive real number. Fourier theory is concerned with 
functions / : R — > R which are periodic with period 2L (in the sense that 
f{x + 2L) = f{x)) and the attempt to express them in the form 

oo oo 

f{x) = c + ^ a„cos {—^^ + X! ^nCOS ■ (A.l) 



n=l 



The letters m and n will always stand for integers greater than or equal to one. 

^ /rm:x\ /"mix\ , 1 if m ^ n, , . 

sin sin ] dx = { ^ . ' (A.2) 



L 
L 



COS 



L J \ L J 1 L otherwise. 

fm'iTX\ (n-KX\ I if m 7^ n, 



, - cos — — ]dx= { ^ (A.3) 

_L \ L J \ L J \L otherwise. 

j sm y—^j cos (^-^ j rfa; = (A. 4) 

Assume for one moment that we are allowed to swap integrals and sum f|3 for 
instance: 

/L oo oo „^ 

a„ cos ( ^ — / a„ cos ^— — ^ dx. 



' 71 — 1 n—1 



^You will see justification for this kind of operation next term in Analysis 4. Specifically, 
we need to assume that the Fourier series converges uniformly to /. 
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Then a function f{x) with Fourier series given by ( A.l I has Fourier coefficients 



1 '-^ 



2L 
1 
L 



c = — j f{x)dx (A.5) 



1 ^ /"mrx\ , / . „s 

f{x) cos — — dx (A.6) 

l-L \ L J 

fix)sml^^jdx (A.7) 

So when do Fourier series exist? Providing / is Ricniann-integrable over 
[—L,L] you can make sense of the integrals (A.6 1 and (A.7) and write down a 
Fourier series (see Question 5 above). 

But when do they actually have anything to do with the function you 
started with? Let's define the partial sums 

N N 

Sn{x) = c + 2^ a„ cos y-j^ j + \17 ) ' 

n—1 n—1 

Theorem 81. If f : H ^ R is a piecewise continuously differentiable function 
with period 2L then it admits a Fourier series (A.l I which converges pointwise 
to f , i.e. for all a; G R such that f is differentiable at x, the difference 

f{x) - Sn{x) 

tends to zero as N oo. Moreover, when xq is a point where f is discontinuous, 
the Fourier sums converge to 

o f \ V /(^o - e) + f{xo + e) 
Sn{xo) hni . 

e— >0 Z 



A. 2 Square-integrable functions and Parseval's 
theorem 

Theorem 82 (Parseval's theorem). If f : H TL is periodic with period 2L 
and square-integrable over [—L,L] (meaning that the integral 

L 

\f{x)fdx 

-L 

exists and is finite ) then 

SN^f 



in the very weak sense that 



\fix)-SN{x)\'dx^O 
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as TV —> oo. In terms familiar from statistics, you can think of Sn as a least- 
squares approximation to f (hut by Fourier sums, rather than polynomials). 
Consequently, 

2c' + Y.{al + bl) = \ r \f{x)fdx. (A.8) 



A. 3 Interpretation 

The way you should think about what's going on is by analogy with finite- 
dimensional vector spaces. With a finite-dimensional vector space V you can 
pick an inner product 

{,):VxV^K 

which, given two vectors, outputs a number. The vectors v, w are orthogonal if 
{v, w) =0. A basis ei, . . . , e„ is orthonormal if (ej, Cj) = 5ij. 

In Fourier theory the relevant vector space is the space of periodic functions 
with period 2L, square-integrable on [—L,L]. The inner product of / and g is 



L 

f{x)g{x)dx. 



Note that this is a well-defined integral: if the integrals J^j^ f{x)'dx emd f^j^g{x)'dx 
are both finite then by the Cauchy-Schwarz inequality, 



L 

f{x)g{x)dx 



< (j'^Jixfdx] (j^ixfdx^ 



the inner product is also finite. 

With respect to this inner product the "vectors" (i.e. functions) 

1 1 /rmTX\ 1 . /rmrx\ 

^cos — — , ^sm — — , m>l 

are all orthonormal. However, they do not form a basis! Being a basis for a 
vector space means that any other vector (in this case function) can be written 
as a linear combination of a finite number of basis elements. The Fourier 
coefficients are the orthogonal projections of a given square-integrable function 
/ onto the basis directions and we have certainly seen functions with infinite 
Fourier series. 

The right way to think about this is that the space of Fourier sums (i.e. Fourier 
series with only finitely many nonzero coefficients) is dense in the space of all 
square-integrable functions, and that you can approximate (in the least-squares 
sense!) arbitrary square-integrable functions by infinite sequences of Fourier 
sums Sn and let N ^ oo. 
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Appendix B 

Sage code for diagrams 



B.l Figure 



y=var ( ' y ' ) 
x=var ( ' X ' ) 
t=var ( ' t ' ) 

P=plot3dCx*y/Cx~2+y~2) , (x,-l,l) , (y,-l,l) ,plot_points= [150, 150] ,f rame=False) 

Ax=parametric_plot (vector ( Ct,0,0)) , Ct, -1.1, 1.1) , thickness=3 , color=' black' ) 

Ay=parametric_plot (vector ( CO,t,0)) , Ct, -1.1, 1.1) , tliickness=3 , color=' black' ) 

Az=parametric_plot (vector ( C0,0,t)) , Ct, -0.7, 0.7) , thickness=3 , color=' black' ) 
show (P+Ax+Ay+Az) 



B.2 Figure 1.2 



x=var ( ' X ' ) 
y=var ( ' y ' ) 
t=var ( ' t ' ) 

P=plot3d(x^3-x-y'2, Cx,-l,l) , Cy,-l,l) ,opacity=0.8,plot_points=[20,20] ,f rame=False) 
Q=plot3dC-x, Cx, -0.4, 0.4) , (y, -0.4, 0.4) ,opacity=0 . 8 , color='gray ' , 

plot_points= [10 , 10] ,f raine=False) 
R=parametric_plot3dC (0,0,0) , Ct,0,l) , color= 'black' , thickness=15) 
show(P+Q+R) 



B.3 Figure 1.3 



y=var ( ' y ' ) 
x=var ( ' X ' ) 
t=var ( ' t ' ) 

P=plot3d(x~4+y"4-x~2-y~2, (x, -0 . 5 , 1) , (y , -0 . 5 , 1) ,plot_points=[20,20] , 

opacity=0 . 85 , f raine=False) 
Q=plot3d(0, (x, -0.3, 0.3) , (y, -0.3, 0.3) ,color='gray' ,plot_points= [20 , 20] , 

opacity=0 . 8 , f rame=False) 
Ax=parametric_plot (vector ( Ct,0,0)) , Ct, -1.1, 1.1) , thickness=3 , color=' black' ) 
Ay=parametric_plot (vector ( C0,t,0)) , Ct, -1.1, 1.1) , thickness=3 , color=' black' ) 
Az=parametric_plot (vector ( CO ,0,t) ) , (t ,-0 .7 ,0 . 7) , thickness=3 , color=' black' ) 
R=parametric_plot3dC (0,0,0) , (t,0,l) , color= 'black' , thickness=15) 
show(P+Q+R) 
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APPENDIX B. SAGE CODE FOR DIAGRAMS 



B.4 Figure |1.4 



x=var C ' X ' ) 
y=var C ' y ' ) 

P=plot3dCx~2-y~2, Cx,-l,l) , Cy,-l,l) ,opacity=0.8,plot_points=[20,20] , 
f rame=False) 

Ax=parametric_plot (vector ( Ct,0,0)) , Ct, -1.1, 1.1) , thickness=3 , 
color= 'black' ) 

Ay=parainetric_plot (vector ( CO,t,0)) , Ct, -1.1, 1.1) , thickness=3 , 
color= 'black' ) 

Az=parametric_plot (vector ( (0 ,0 , t) ) , Ct , -0 . 7 ,0 . 7) , thickness=3 , 

color= 'black' ) 
show CP+AX+ Ay+Az) 



B.5 Figures |4.1| and |4.2 
B.5.1 Parts (a) 



Use one or other of the show eommands. 



x=var C ' X ' ) 
y=var C ' y ' ) 
z=var C ' z ' ) 
t=var C ' t ' ) 
u=var C ' u ' ) 

p=plot_vector_field3dCCl,z,0) , Cx,-l,l) , Cy,-l,l) , (z,-l,l) , f rame=False) 
q=plot_vector_f ield3dCCl,z,0) , Cx , -1 , 1) , Cy , -1 , 1) , (z , , 1) , f raine=False) 
r=parainetric_plot3dC Ct ,u+t*u,u) , (t ,0, 1) , Cu, -1 , 1) ,plot_points= [10 , 10] , 

opacity=0 . 8 , f raine=False) 
s=parainetric_plot3dCCt,u+t*u~2,u"2) , Ct , , 1) , Cu, -1 , 1) ,plot_points= [10 , 10] , 

opacity =0 . 8, f raine=False) 
Ax=parametric_plot (vector ( Ct,0,0)) , Ct, -1.1, 1.1) , 

thickness=3 , color= 'black' ) 
Ay=parametric_plot (vector ( CO,t,0)) , Ct, -1.1, 1.1) , 

thickness=3, color= 'black' ) 
Az=parametric_plot (vector ( CO,0,t)) , Ct, -0.7, 0.7) , 

thickness=3, color= 'black' ) 
Az2=parametric_plot(vectorCC0,0,t)) , Ct,0,0.7) , 

thickness=3, color= 'black' ) 
show Cp+r+Ax+ Ay+Az , mesh=True ) 
showCq+s+Ax+Ay+Az2 ,mesh=True) 



B.5. 2 Figure |4?T|(b) 



t=var C ' t ' ) 

pl=parametric_plot (Ct,0) , (t, -1.2,1) ,axes=False) 
p2=parametrlc_plot((t ,0. 1+0. l*t) , (t, -1.2,1) ,axes=False) 
p3=parametrlc_plot((t,0.2+0.2*t) , (t, -1.2,1) ,axes=False) 
p4=parametric_plot((t ,0.3+0. 3*t) , (t ,-1 .2,1) ,axes=False) 
p5=parametrlc_plot((t ,0.4+0. 4*t) , (t ,-1 .2,1) ,axes=False) 
p6=parametric_plot ( Ct ,0 . 5+0 .5*t),Ct, -1.2,1), axes=False) 
p7=parametrlc_plot((t,0.6+0.6*t) , (t, -1.2,1) ,axes=False) 
p8=parametric_plot((t,0.7+0.7*t) , (t, -1.2,1) ,axes=False) 
p9=parametrlc.plot((t,0.8+0.8*t) , (t, -1.2,1) ,axes=False) 
pA=parametrlc_plot(Ct,0.9+0.9*t) , (t, -1.2,1) ,axes=False) 
pB=parametrlc_plot ( (t , l+l"2*t) , (t ,-1 .2, 1) ,axes=False) 
ql=parametric_plot (Ct,0) , (t, -1.2,1) , axes=False) 
q2=parametrlc_plot(Ct ,-0. 1-0. l*t) , (t,-1.2, 1) ,axes=False) 
q3=parametrlc_plot((t,-0.2-0.2*t) , (t, -1.2,1) ,axes=False) 
q4=parametric_plot((t ,-0.3-0. 3*t) , (t,-1.2, 1) ,axes=False) 
q5=parametrlc_plot ((t ,-0 .4-0.4*t) , (t ,-1 .2, 1) ,axes=False) 
q6=parametric_plot ( Ct , -0 . 5-0 . 5*t) , (t , -1 . 2 , 1) , axes=False) 



B.6. FIGURE ?? 
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q7=parametric_plot((t ,-0.6-0. 6*t) , (t,-1.2, 1) ,axes=False) 
q8=parametric_plot((t ,-0.7-0. 7*t) , (t,-1.2, 1) ,axes=False) 
q9=parametric_plot((t ,-0.8-0. 8*t) , (t,-1.2, 1) ,axes=False) 
qA=parametrlc_plot((t ,-0.9-0. 9*t) , (t,-1.2, 1) ,axes=False) 
qB=parametric_plot CCt,-l-l*t) , (t,-1.2, 1) , axes=False) 
Ax=parametric_plot (vector C Ct,0)),Ct,-l.l,l.l), 

thickness=3, color= 'black' ) 
Ay=parametric_plot (vector ( (0,t)),(t,-l.l,l.l), 

thickness=3, color= 'black' ) 
show (pl+p2+p3+p4+p5+p6+p7+p8+p9+pA+pB+ql+q2+q3+q4+q5+q6+ 

q7+q8+q9+qA+qB+Ax+Ay) 



B.5.3 Figure |4?2|(b) 



t,0) , (t, -1.2,1 
t,0.1+0.1-2*t) 
t,0.2+0.2"2*t) 
t,0.3+0.3"2*t) 
t,0.4+0.4"2*t) 
t,0.5+0.5"2*t) 
t,0.6+0.6"2*t) 
t,0.7+0.7-2*t) 
t,0.8+0.8"2*t) 
t,0.9+0.9"2*t) 
t,l+l"2*t) , (t, 



t=var( 't ' ) 
pl=parametric_plot ( 
p2=parametric_plot ( 
p3=parametric_plot ( 
p4=parametric_plot ( 
p5=parametric_plot ( 
p6=parametric_plot ( 
p7=parametric_plot ( 
p8=parametric_plot ( 
p9=parametric_plot ( 
pA=parametric_plot ( 
pB=parametric_plot ( 
ql=parametric_plot ( 
q2=parametric_plot ( 
q3=parametric_plot ( 
q4=parametric_plot ( 
q5=parametric_plot ( 
q6=parametric_plot ( 
q7=parametric_plot ( 
q8=parametric_plot ( 
q9=parametric_plot ( 
qA=parametric_plot ( 
qB=parametric_plot ( 
Ax=parametric_plot (vector ( (t ,0) ) 

thickness=3, color= 'black' ) 
Ay=parametric_plot (vector ( (0 , t) ) 

thickness=3, color= 'black' ) 
r=implicit_plot (H-4*x*y==0 , (x, -1 

f rame=False) 
show(pl+p2+p3+p4+p5+p6+p7+p8+p9+pA 

q7+q8+q9+qA+qB+Ax+Ay+r) 



t,0) , (t 
t,-0 
t,-0 
t,-0 
t,-0 
t,-0 
t,-0 
t,-0 
t,-0 
t,-0 



1.2,1 
H-0.1-2*t 
2+0.2"2*t 

3- 2*t 

4- 2*t 

5- 2*t 
6"2*t 
7-2*t 

8+0.8"2*t 
9+0.9"2*t 



3+0 
4+0 
5+0 
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7+0 



) , axes 
(t,-l 
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),(t, 
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1,1.1), 



t,-l.l, 

) , (y,-l, 1) ,color='red' , 
pB+ql+q2+q3+q4+q5+q6+ 



B.6 Figure 4.4 



t=var( 't ' ) 

p_l=paraiiietric_plot ( (2*cos (t) ,sin(t)) , (t ,0 ,2*pi) ) 

p_2=parametric_plot((2*cos(t)-0.2*cos(t)/(l+sin(t)*sin(t)) , sin(t)-0 . 2*2*sin(t) / (l+sin(t) *sin(t) ) ) , (t,0,2*pi)) 
p_3=parametric_plot((2*cos(t)-0.5*cos(t)/(l+sin(t)*sin(t)) , sin(t)-0 . 5*2*sin(t) / (l+sin{t) *sin{t) ) ) , (t,0,2*pi)) 
p_4=parametric_plot((2*cos(t)-0.7*cos(t)/(l+sin(t)*sin(t)) , sin(t)-0 . 7*2*sin(t) / (l+sin(t) *sin(t) ) ) , (t,0,2*pi)) 
p_5=parametric_plot((2*cos(t)-l*cos(t)/{l+sin(t)*sin(t)) , sin(t) -l*2*siii{t) / (l+sin(t) *sin(t) ) ) , (t,0,2*pi)) 
p_6=parametric_plot((2*cos(t)-1.2*cos(t)/(l+sin(t)*sin(t)) , sin(t)-l . 2*2*sin(t) / (l+sin(t) *sin(t) ) ) , (t,0,2*pi)) 
show (p_l+p_2+p_3+p_4+p_5+p_6) 



